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Abstract 

The whole frame of interconnections in complex networks hinges on a specific set 
of structural nodes, much smaller than the total size, which, if activated, would cause 
the spread of information to the whole network |Tj; or, if immunized, would prevent 
the diffusion of a large scale epidemic [2j, [3]. Localizing this optimal, i.e. minimal, 
set of structural nodes, called infiuencers, is one of the most important problems in 
network science [4l[5]. Despite the vast use of heuristic strategies to identify influential 
spreaders [61 - 114] . the problem remains unsolved. Here, we map the problem onto 
optimal percolation in random networks to identify the minimal set of infiuencers, 
which arise by minimizing the energy of a many-body system, where the form of 
the interactions is fixed by the non-backtracking matrix ]15| of the network. Big 
data analyses reveal that the set of optimal infiuencers is much smaller than the one 
predicted by previous heuristic centralities. Remarkably, a large number of previously 
neglected weakly-connected nodes emerges among the optimal infiuencers. These are 
topologically tagged as low-degree nodes surrounded by hierarchical coronas of hubs, 
and are uncovered only through the optimal collective interplay of all the infiuencers in 
the network. Eventually, the present theoretical framework may hold a larger degree 
of universality, being applicable to other hard optimization problems exhibiting a 
continuous transition from a known phase [16] . 
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The optimal influence problem was initially introduced in the context of viral marketing 
PP, and its solution was shown to be NP-hard |1] for a generic class of linear threshold 
models of information spreading [nins]. Indeed, finding the optimal set of influencers is a 
many-body problem in which the topological interactions between them play a crucial role 
On the other hand, there has been an abundant production of heuristic rankings to 
identify influential nodes and ’’superspreaders” in networks [SHIllIIH]. The main problem 
is that heuristic methods do not optimize a global function of influence. As a consequence, 
there is no guarantee of their performance. 

Here we address the problem of quantifying node’s influence by finding the optimal (i.e. 
minimal) set of structural influencers. After defining a unified mathematical framework for 
both immunization and spreading, we provide its optimal solution in random networks by 
mapping the problem onto optimal percolation. In addition, we present Cl (which stands 
for Collective Influence), a scalable algorithm to solve the optimization problem in large 
scale datasets. The thorough comparison with competing methods (Methods Section]^ j2Uj ) 
ultimately establishes the major performance of our algorithm. By taking into account 
collective influence effects, our optimization theory identifies a new class of strategic influ¬ 
encers, called weak-nodes, which outrank the hubs in the network. Thus, the top influencers 
are highly counterintuitive: low degree nodes play a major broker role in the network, and 
despite being weakly connected, can be powerful influencers. 

The problem of finding the minimal set of activated nodes [miiH] to spread information 
to the whole network |1] or to optimally immunize a network against epidemics m can be 


exactly mapped onto optimal percolation (see Methods Section IIB). This mapping provides 
the mathematical support to the intuitive relation between influence and the concept of 
cohesion of a network: the most influential nodes are the ones forming the minimal set 
that guarantees a global connection of the network [a El [in]. We call this minimal set the 
“optimal influencers” of the network. At a general level, the optimal influence problem can 
be stated as follows: And the minimal set of nodes which, if removed, would break down 
the network in many disconnected pieces. The natural measure of influence is, therefore, 
the size of the largest (giant) connected component as the influencers are removed from the 
network. 

We consider a network composed of N nodes tied with M links with arbitrary degree 
distribution P{k). Let us suppose we remove a certain fraction q of the total number 
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of nodes. It is well known from percolation theory [21] that, if we choose these nodes 
randomly, the network undergoes a structural collapse at a certain critical fraction where 
the probability of existence of the giant connected component vanishes, G = 0. The optimal 
influence problem corresponds to hnding the minimum fraction qc of influencers to fragment 
the network: Qc = min{g G [0,1] : G{q) = 0}. 

Let the vector n = (rii,... ,nAr) represents which node is removed (n* = 0, influencer) 
or left (rij = 1, the rest) in the network [q = 1 — ^/N and consider a link from 

i ^ j. The order parameter of the influence problem is the probability that i belongs to 
the giant component in a modihed network where j is absent, [221 E3|- Clearly, in the 
absence of a giant component we hnd = 0} for all i — )■ j. The stability of the solution 

= 0} is controlled by the largest eigenvalue A(n;g) of the linear operator Jli dehned 


on the 2M x 2M directed edges as Ai 

random graphs (see Fig. and Methods Section 0: 




We hnd for locally-tree like 


■At ( 1 ) 

where is the non-hacktracking matrix of the network [T5ll21|. The matrix 

has non-zero entries only when [k ^ ^ j) form a pair of consecutive non-backtracking 


directed edges, i.e. (/c — j) with k ^ j. In this case = 1 (Eq. S13). Powers 

of the matrix B count the number of non-backtracking walks of a given length in the network 
(Fig. [^d) [21], much in the same way as powers of the adjacency matrix count the number of 
paths [5] . Operator B has recently received a lot of attention thanks to its high performance 
in the problem of community detection [25l |26]. Below, we show its topological power in 
the problem of optimal percolation. 

Stability of the solution {vi^j = 0} requires A(n; q) < 1. The optimal influence problem 
for a given q {> qc) can be rephrased as hnding the optimal conhguration n that minimizes 
the largest eigenvalue A(n;g) (Fig. [^). The optimal set n* of Nqc inhuencers is obtained 
when the minimum of the largest eigenvalue reaches the critical threshold: 


\{n*;qc) = 1 . 


( 2 ) 


The formal mathematical mapping of the optimal inhuence problem to the minimization of 
the largest eigenvalue of the modihed non-backtracking matrix for random networks, Eq. 
(|^, represents our hrst main result. 
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An example of non-optimized solution corresponds to choosing n* at random and decou¬ 
pled from the non-backtracking matrix [23112Z| (random percolation [21], Methods Section 


IID). In the optimized case we seek to derandomize the selection of the set Uj = 0 and 


optimally choose them to hnd the best conhguration n* with the lowest Qc according to Eq. 
(|^. The eigenvalue A(n) (from now on we omit q in A(n;g) = A(n), which is always kept 
hxed) determines the growth rate of an arbitrary vector wq with 2M entries after ^ iterations 
of the matrix Ad: |w£(n)| = = |Ad^Wo| = (wo|(Ad^)'^Ad^|wo)^/^ ~ g^iogA(n)_ 

The largest eigenvalue is then calculated by the Power Method: 


A(n) = lim 




|wKn)| 

|wo| 


l/£ 


(3) 


Equation ([^ is the starting point of an (inhnite) perturbation series which provides 
the exact solution to the many-body influence problem in random networks and therefore 
contains all physical effects, including the collective influence. In practice, we minimize the 
cost energy function of influence |w£(n)| in Eq. ([^ for a hnite 1. The solution rapidly 
converges to the exact value at £ —>■ oo; the faster the larger the spectral gap. We hnd for 


£ > 1 as a leading order in 1/iV (Methods Section HE): 


N 


|w£(n)|2 = ^(fci - 1) ^ I Y[ nk]{kj-l), 

je9Ball(i,2f—1) \k&'P 2 l-l(i,j) 


(4) 


i=l 


where Ball(b^) is the set of nodes inside a ball of radius £ (dehned as the shortest path) 
around node i, c?Ball(q t) is the frontier of the ball and is the shortest path of length 

^ connecting i and j (Fig. [^). 

The hrst collective optimization in Eq. (|4|) is £ = 1. We hnd |wi(n)|2 = “ 


l){kj — l)ninj (Eq. ^39). This term is interpreted as the energy of an antiferromagnetic 
Ising model with random bonds in a random external held at hxed magnetization, which 
is an example of a pair-wise NP-complete spin-glass whose solution is found in Methods 


Section III with the cavity method 


For £ > 2, the problem can be mapped exactly to a statistical mechanical system with 
many-body interactions which can be recast in terms of a diagrammatic expansion, Eqs. 


41T49, For example, |w 2 (n)f leads to 4-body interactions (Eq. S45), and, in general, the 


energy cost |w£(n)p contains 2f'-body interactions. As soon as £ > 2, the cavity method 
becomes much more complicated to implement and we use another suitable method, called 
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extremal optimization (EO) [29] (Methods Section IV). This method estimates the true 
optimal value of the threshold by hnite size scaling following extrapolation to £ —cxd 
(E xtended Data Fig. |^. However, EO is not scalable to hnd the optimal conhguration in 
large networks. Therefore, we develop an adaptive method, which performs excellently in 
practice, preserves the features of EO, and is highly scalable to present-day big-data. 

The idea is to remove the nodes causing the biggest drop in the energy function Eq. (|^. 
First, we dehne a ball of radius i around every node (Fig. 01 ) Then, we consider the 
nodes belonging to the frontier c}Ball(i,f') and assign to node i the collective influence (Cl) 
strength at level i following Eq. (|^ (see Methods Section |V] for implementation and Section 


V A for minimizing G(q') 7 ^ 0): 


ci,(i) = ft-i) fe-l) 

jeaBall(j/) 


(5) 


We notice that, while Eq. (|^ is valid only for odd radii of the ball, Ch(i) is dehned also for 
even radii. This generalization is possible by considering an energy function for even radii 
analogous to Eq. (|^, as explained in Methods Section IIG The case of one-body interaction 
with zero radius £ = 0 (Eq. leads to the high-degree (HD) ranking (Eq. [TU] . 

The collective influence Eq. ([^ is our second and most important result since it is the 
basis for the highly scalable and optimized Cl-algorithm which follows. In the beginning all 
the nodes are present: n* = 1 for all i. Then, we remove node i* with highest Ch and set 
Ui* = 0. The degrees of its neighbours are decreased by one, and the procedure is repeated to 
hnd the new top Cl node to remove. The algorithm is terminated when the giant component 
is zero. By increasing the radius i of the ball we obtain better and better approximations 
of the optimal exact solution at £ —)■ cxd (for hnite networks, I does not exceed the network 
diameter). 

The collective inhuence Ch for £ > 1 has a rich topological content, and consequently 
can tell us more about the role played by nodes in the network than the non-interacting 
high-degree hub removal strategy at £ = 0 , CIq. The augmented information comes from 
the sum in the r.h.s, which is absent in the naive high-degree rank. This sum contains the 
contribution of the nodes living on the surface of the ball surrounding the central vertex i, 
each node weighted by the factor kj — 1. This means that a node placed at the center of 
a corona irradiating many links— the structure hierarchically emerging at different Glevels 
as seen in Fig. — can have a very large collective influence, even if it has a moderate or 
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low degree. Such “weak-nodes” can outrank nodes with larger degree that occupy mediocre 
peripherical locations in the network. The commonly used word ’weak’ in this context sounds 
particularly paradoxical. It is, indeed, usually used as a synonymous for a low-degree node 
with an additional ’’bridging” property, which has resisted a quantitative formulation. We 
provide this dehnition through Eq. ([^, according to which weak nodes are, de facto, quite 
strong. Paraphrasing Granovetter’s conundrum Eq. ([^ quantihes the “strength of 
weak nodes”. 

The Cl-algorithm scales as ~ 0{N log N) by removing a hnite fraction of nodes at each 


step (Methods Section VB). This high scalability allows us to hnd top influencers in current 
big-data social media and optimal immunizators in large-scale populations at the country 
level. The applications are investigated next. 

Figure shows the optimal threshold for random Erdos-Renyi (ER) network |S] 
(marked by the vertical line) obtained by extrapolating the EO solution to inhnite size. 


—)■ cx), and £ —)• oo (Methods Section IV). In the same figure we compare the optimal 
threshold against the heuristic centrality measures: high-degree (HD) |9], high-degree adap¬ 
tive (HDA), PageRank (PR) [7], closeness centrality (CC) [6], eigenvector centrality (EC) 
[0], and k-core [12] (see Methods Section]^ for dehnitions). Methods Section VI and VII show 
the comparison with the remaining heuristics (sum and the Belief Propagation method of 
HI, respectively, which have worst computational complexity (and optimality), and cannot 
be applied to the network sizes used here. Remarkably, at the optimal value Qc predicted 
by our theory, the best among the heuristic methods (HDA, PR and HD) still predict a 
giant component ~ 50% — 60% of the whole original network. Furthermore, the influencer 
threshold predicted by Cl approximates very well the optimal one, and, notably. Cl outper¬ 
forms the other strategies. Figure]^ compares Cl in scale-free (SF) network |S] against the 
best heuristic methods, i.e. HDA and HD. In all cases. Cl produces smaller threshold and 
smaller giant component (Fig. |^). 

As an example of information spreading network, we consider the web of Twitter users 
(Methods Section VHI [IH]). Figure]^ shows the giant component of Twitter when a fraction 
q of its influencers is removed following CL It is surprising that a lot of Twitter users with 
a large number of contacts have a mild influence on the network, as witnessed by the fact 
that, when Cl (at i = 5) predicts a zero giant component (and so it exhausts the number 
of optimal influencers), the scalable heuristic ranks (HD, HDA, PR and k-core) still give a 
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pretty big giant component of the order of 30-70% of the entire network, and, inevitably, hnd 
a remarkably larger nnmber of (fake) inflnencers which is 50% larger than that predicted by 
Cl (Fig, If and Methods Section [VIII[ ). One canse for the poor performance of the high- 
degree rank is that most of the hnbs are clustered (rich-club effect), which gives a mediocre 
importance to their contacts. As a consequence, hubs are outranked by nodes with lower 
degree surrounded by coronas of hubs (shown in the detail of Fig. &) , i.e. the weak-nodes 
predicted by the theory (Fig. [^). 

Finally, we simulate an immunization scheme on a personal contact network built on 
the phone calls performed by 14 million people in Mexico (Methods Section [Ix| . Figure]^ 
shows that our method saves a large amount vaccines stockpile or, equivalently, hnd the 
smallest possible set of people to quarantine outranking the scalable heuristics in large real 
networks as well. Thus, while the mapping of the inhuencer identihcation problem onto 
optimal percolation is strictly valid for locally tree-like random networks, our results may 
apply also for real loopy networks, provided the density of loops is not excessively large. 

Our solution to the optimal inhuence problem shows its importance in that it helps 
to unveil hitherto hidden relations between people, as witnessed by the weak-node effect. 
This, in turn, is the byproduct of a broader notion of inhuence, lifted from the individual 
non-interacting point of view [6HT21 [m EQ] to the collective sphere: inhuence is an emer¬ 
gent property of collectivity and top inhuencers arise from the optimization of the complex 
interactions they stipulate. 
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FIG. Non-backtracking (NB) matrix and weak-nodes. a, The largest eigen¬ 
value A of exemplified on a simple network. The optimal strategy for immunization and 
spreading minimizes A by removing the minimum number of nodes (optimal influencers) 
that destroys all the loops. Left panel; The action of the matrix A4 is on the directed 
edges of the network. The entry Al 2 ->- 3 , 3^.5 = n 3 B 2 ^ 3 , 3^5 = encodes node 3’s occupancy 
(ns = 1) or vacancy (ns = 0). In this particular case, the largest eigenvalue is A = 1. Center 
panel: Not-optimal removal of a leaf, n 4 = 0, which does not decrease A. Right panel; 
Optimal removal of a loop, ns = 0, which decreases A to zero, b, A NB walk is a random 
walk that is not allowed to return back along the edge that it just traversed. We show a 
NB open walk (£ = 3), a NB closed walk with a tail {i = 4), and a NB closed walk with 
no tails (£ = 5). The NB walks are the building blocks of the diagrammatic expansion to 
calculate A. c, Representation of the global minimum over n of the largest eigenvalue A of 
A4 vs q. When q > qc, the minimum is at A = 0. Then, G = 0 is stable (still, non-optimal 
conhgurations exist with A > 1 for which G > 0). When q < q^., the minimum of the largest 
eigenvalue is always A > 1, the solution G = 0 is unstable, and then G > 0. At the optimal 
percolation transition, the minimum is at n* with A(n*, gc) = 1- For g = 0, we hnd A = k — 1 
{k = {k"^)/(k)) which is the largest eigenvalue of B for random networks [25] with all nodes 
present (n* = 1). When A = 1, the giant component is reduced to a tree plus one single 
loop (unicyclic graph), which is suddenly destroyed at the transition g^ to become a tree, 
causing the abrupt fall of A to zero, d, Ball(i, £) of radius i around node i is the set of nodes 
contained in the grey region and dBall is the set of nodes on the boundary. The shortest 
path from i to j is colored in red. e, Example of a weak-node: a node with a small number 
of connections surrounded by hierarchical coronas of hubs at different Glevels. 


FIG. Exact optimal solution and performance of Cl in synthetic networks. 

a, G(g) in ER network {N = 2 x 10^, (k) = 3.5, error bars are s.e.m. over 20 realizations) 
for the true optimal solution with EO (x). Cl, HDA, PR, HD, CC, EC and k-core. The 
other methods are not scalable and perform worst than HDA and are treated in Methods 


Sections VI and VH Cl is close to the optimal g°P* ~ 0.193 obtained with EO in Methods 


Section IV Note that EO can estimate the extrapolated optimal value of qc, but it cannot 
provide the optimal conhguration for large systems. Inset: qc (obtained at the peak of the 
second largest cluster) for the three best methods vs (k). b, G(g) for SF network with degree 
exponent 7 = 3, maximum degree /Cmax = 10^ and N = 2x 10^ (error bars are s.e.m. over 20 
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realizations). Inset: qc vs 7 . The continuous blue line is the HD analytical result computed 


in Methods Section IIG c, SF network with 7 = 8 after the removal of the 15% of nodes, 
using the three methods. Cl produces a much reduced giant component (red nodes). 

FIG. 1^ Performance of Cl in large-scale real social networks, a, Giant com¬ 
ponent G{q) of Twitter users [19] {N = 469,013) computed using Cl, HDA, PR, PR and 
k-core strategies (other heuristics have prohibitive running times for this system size), b, 


Percentage of fake influencers or false positives (PFI, Eq. ^120) in Twitter as a function 
of q, deftned as the percentage of non-optimal influencers identifted by HD algorithm in 
comparison with CL Below PFI reaches as much as ~ 40% indicating the failure of HD 
in optimally Ending the top influencers. Indeed, to obtain G = 0, HD has to remove a much 
larger number of fake influencers, which at g™ reaches PFI ~ 48%. c, Example out of the 
many weak-nodes found in Twitter; the crucial influencer missed by all heuristic strategies, 
d, G{q) for a social network of 1.4 x 10^ mobile phone users in Mexico representing an 
example of big-data to test the scalability and performance of the algorithm. Cl immunizes 
this social network using half a million less people than the best heuristic strategy (HDA), 
saving ~35% of vaccine stockpile. 
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Extended Data Fig. HD high -degree threshold, a, HD influence threshold Qc 
as a function of the degree distribution exponent 7 of scale-free networks in the ensemble 
with fcmax = and N —)■ 00 . The curves refer to different values of the minimum 

degree m: 1 (red), 2 (blue), 3 (black). The fragility of SF networks (small Qc) is notable 
for m = 1, the case calculated in [10]. In this case, the network contains many leaves, and 
reduces to a star at 7 = 2 which is trivially destroyed by removing the only single hub, 
explaining the general fragility in this case. Furthermore, in this case, the network becomes 
a collection of dimers with k = 1 when 7 —)■ 00 , which is still trivially fragile. This explains 
why gc 0 as 7 —)■ 00 , as well. Therefore, the fragility in the case m = 1 has its roots in 
these two limiting trivial cases. Removing the leaves (m = 2) results in a 2-core, which is 
already more robust. For the 3-core m = 3, gc ~ 0.4 — 0.5 provides a quite robust network, 
and has the expected asymptotic limit to a non-zero Qc of a random regular graph with 
A; = 3 as 7 —>■ 00 , gc —t {k — 2)/{k — 1 ) = 0.5. Thus, SF networks become robust in these 
more realistic cases and the search for other attack strategies becomes even more important, 
b, HD influence threshold gc as a function of the degree distribution exponent of scale-free 
networks with minimum degree m = 2 in the ensemble where fcmax is fixed and does not 
scale with N. The curves refer to different values of the cut-off k^as.^- 10^ (red), 10^ (green), 
10® (blue), 10® (magenta), and fcmax = 00 (black), and show that for typical fc max degree of 
10®, for instance in social networks, the network is fairly robust with ~ 0.2 for all 7 . 

Extended Data Fig. RS estimation of the maximum eigenvalue. Main panel: 
the eigenvalue Af®(g) obtained by minimizing the energy function S{s) with the RS cavity 
method. The curve was computed on a Erdos-Renyi graph of = 10, 000 nodes and average 
degree (k) = 3.5 and then averaged over 40 realizations of the network. Inset: Comparison 
between the cavity method and extremal optimization for an ER graph of (k) = 3.5 and 
N = 128. The curves are averaged over 200 realizations (error bars are s.e.m.). 

Extended Data Fig. EO estimation of the maximum eigenvalue. Eigenvalue 
A(g) obtained by minimizing the energy function T(n) with rEO, plotted as a function of 
the fraction of removed nodes g. The panels are for different orders of the interactions. The 
curves in each panel refer to different sizes of Erdos-Renyi networks with average connectivity 
{k) = 3.5. Each curve is an average over 200 instances (error bars are s.e.m.). The value gc 
where A(gc) = 1 is the threshold for a particular N and many-body interaction. 

Extended Data Fig. Estimation of optimal threshold g°P* with EO. a, Critical 
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threshold Qc as a function of the system size N obtained with EO from from Extended Data 
Fig. I of ER networks with {k) = 3.5 and varying size. The curves refer to different orders 
of the many-body interactions. The data show a linear behaviour as a function of 
typical of spin glasses, for each many-body interaction p. The extrapolated value q'^{p) 
is obtained at the y-intercept. b, Thermodynamical critical threshold q^{p) as a function 
of the order of the interactions p from a. The data scale linearly with 1/p. From the y- 
intercept of the linear £t we obtain the thermodynamical limit of the inhnite-body optimal 
value g/P* = g“(p —)■ oo) = 0.192(9). 

Extended Data[^ Comparison of the Cl algorithm for different radius i of the 

Ball(£). We use i = 1,2, 3,4, 5, on a ER graph with average degree (k) = 3.5 and N = 10® 
(the average is taken over 20 realizations of the network, error bars are s.e.m.). For £ = 3 
the performance is already practically indistinguishable from £ = 4, 5. The stability analysis 
we developed to minimize qc is strictly valid only when G = 0, since the largest eigenvalue 
of the modihed NB matrix controls the stability of the solution G = 0, and not the stability 
of the solution G > 0. In the region where G > 0 we use a simple and fast procedure to 
minimize G explained in Section V A This explains why there is a small dependence on £ 
having a slightly larger G for larger £, when G > 0 in the region q k. 0.15. 

Extended Data Fig. Illustration of the algorithm used to minimize G(g) for 
q < qc- Starting from the completely fragmented network at g = qc, nodes are reinserted 
with the following criterion: each node is assigned and index c(i) given by the number of 
clusters it would join if it were reinserted in the network. For example, the red node has 
c(red) = 2, while the blue one has c(blue) = 3. The node with the smallest c{i) is reinserted 
in the network: in this case the red node. Then the c(z)’s are recalculated and the new 
node with the smallest c{i) is found and reinserted. The algorithm is repeated until all the 
removed nodes are reinserted in the network. 

Extended Data Fig. Test of the decimation fraction. Giant component as a 
function of the removed nodes using Cl, for an ER network of = 10® nodes and average 
degree {k) = 3.5. The prohles of the curves are drawn for different percentages of nodes 
hxed at each step of the decimation algorithm. 

Extended Data Fig. Comparison of the performance of Cl, BC, and EGP. 
We also include HD, HDA, EC, CC, k-core, and PR. We use a scale-free network with degree 
exponent 7 = 2.5, average degree {k) = 4.68, and N = 10^. We use the same parameters as 
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in Ref. [TT] . 

Extended Data Fig. Comparison with BP. a, Fraction of infected nodes / as 
a function of the fraction of immunized nodes q in SIR from BP solution. We use a ER 
random graph of iV = 200 nodes and average degree (k) = 3.5. The fraction of initially 
infected nodes is p = 0.1 and the inverse temperature /3 = 3.0. The prohles are drawn for 
different values of the transmission probability w: 0.4 (red curve), 0.5 (green), 0.6 (blue), 
0.7 (magenta). Also shown are the results of the hxed density BP algorithm (open circles), 
b, Chemical potential p as a function of the immunized nodes q from BP. We use a ER 
random graph of = 200 nodes and average degree (k) = 3.5. The fraction of the initially 
infected nodes is p = 0.1 and the inverse temperature /3 = 3.0. The prohles are drawn for 
different values of the transmission probability w: 0.4 (red curve), 0.5 (green), 0.6 (blue), 
0.7 (magenta). Also shown are the results of the hxed density BP algorithm (open circles) 
for the region where the chemical potential is non convex, c, Comparison between the giant 
components obtained with Cl, HDA, HD and BP. We use an ER network of A^ = 10^ and 
{k) = 3.5. We also show the solution of Cl from Fig. for 77 = 10^. We hnd in order 
of performance: Cl, HDA, BP, and HD. (The average is taken over 20 realizations of the 
network, error bars are s.e.m.) d, Comparison between the giant components obtained with 
Cl, HDA, HD and BPD. We use a SF network with degree exponent 7 = 3.0, minimum 
degree kmin = 2, and N = 10^ nodes. 


Extended Data Fig. 10 Fraction of infected nodes /(g) as a function of the 
fraction of immunized nodes g in SIR from BP. We use the following parameters: 
initial infected people p = 0.1, and transmission probability w = 0.5. We use an ER 
network of A^ = 10^ nodes and {k) = 3.5. We compare Cl, HDA and BP. All strategies give 
similar performance due to the large value of the initial infection p which washes out the 
optimization performed by any sensible strategy, in agreement with the results of [H], Fig 
12a. 
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I. HEURISTIC METHODS USED TO IDENTIFY INFLUENTIAL SPREADERS 
IN COMPLEX NETWORKS 


In this section we describe the existing heuristic algorithms in the literature that have 
been used to identify influential spreaders and superspreaders in networks. We use these 
heuristics to compare with our collective theory of influence. A common feature of the 
heuristic methods is that they are not designed from hrst principles and therefore do not 
necessarily optimize an influence measure. Instead, they are based on intuitive ideas about 
what is an influencer. Besides, the heuristics constitute ranking of nodes lacking the collec¬ 
tive influence arising from considering all the influencers at once. On the other hand, the 
theoretical framework for maximizing the spread of influence of Kempe et al. [1] and the 
Belief Propagation theory for the optimal immunization problem of Altarelli et al. mm 
contain the necessary optimization of influence. The greedy algorithm considered in [1] has 
prohibitive running time for all the networks considered in our work. Detailed comparison 


with the Belief Propagation is done in Methods Section VII 


High-Degree (HD) [21 IHl 110] . In the HD method nodes are ranked by degree, and 
sequentially removed starting from the node of highest degree. One of the limitations of 
this method is the fact that hubs may form tightly-knit groups called “rich-clubs” [311132] . 
Strategies based on high-degree will highly rank these rich-club hubs. On the other hand, an 
optimized scheme will target only one of them to avoid overlap between the already attacked 
areas in the network. High Degree Adaptive (HDA) is the adaptive version where the 
degree of the remaining nodes is recomputed after each node removal. 


PageRank (PR) [7]. This is the famous Google’s algorithm for ranking websites. It was 
proposed for first time in [7] to “condensing every page in the World Wide Web into a single 
number, its PageRank. PageRank is a global ranking of all web pages, regardless of their 
content, based solely on their location in the Web’s graph structure.” PR can be thought 
of as the most successful rank, ever. At its heart, it is another eigenvector centrality. It 
computes the probability that, if someone follows links on the web at random, performing 
a random walk of clicks, he/she eventually hits your website. The higher this chance, the 
higher the PR of the website. Therefore, sites that get linked more are considered reputable, 
and, linking to other websites, they pass that reputation along. Thus, the shortcoming with 
PR comes from the fact that PR takes node’s score into account when calculating other’s 
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scores. In other words, a high-PR site may confer a mnch higher score to otherwise nnpopular 
sites it happens to link. Notice that in our algorithm using the non-backtracking operator 
this problem is cured nicely, since the influence is computed by ’’ignoring” the node you 
come from. 

K-core [12]. K-core ranking is based on the k-shell decomposition of the network. Each 
node is assigned the k-shell number, ks, i.e. the order of the shell it belongs to. In k- 
shell decomposition, we hrst remove all nodes with degree k = 1 and continue pruning the 
network iteratively until there is no node with k = 1. These removed nodes belong to the 
peripheric k-shell with index kg = 1- Similarly, the next k-shells are dehned until all nodes 
are pruned and we get to the kcore of the network. The rank based on kcore produces good 
results in identifying single spreaders individually, but has a poor performance for multiple 
spreaders (the case considered in the present manuscript), because putting spreaders in the 
same k-shell gives a marginal or null advantage, as recognized in [T2]. That is, ranking 
the spreaders one by one, one may hnd that the best of them are located in the core. 
However, when considering the maximization of influence of all spreaders at the same time, 
collective effects coming from interactions between the spreaders via overlap of their spheres 
of influence are crucial: even if the top spreader is in the core, the next spreader most 
probably will not be in the core, because the core is already infected by the hrst spreader. 
Indeed, these interactions between spreaders are what makes the problem hard to solve 
(NP-hard as shown in |1]). Thus, even if the best individually-ranked spreaders might be 
located in the kcore, their collective inhuence is determined by their full set of interactions. 
Therefore, for multiple spreaders, the kshell ranking is not optimal, although, choosing core 
nodes separated by a distance increases their optimality as already shown in [T2|. 

Eigenvector Centrality (EC) [33]. It is the eigenvector corresponding to the largest 
eigenvalue of the adjacency matrix. Node rank is the corresponding entry of the eigenvector. 
Nodes are removed starting from the highest rank. This method is not very powerful, 
especially for the case of SF networks, where most of the weight may be carried by few 
nodes (hubs), while the others have vanishingly small weights, and thus they are not properly 
ranked. 

Closeness Centrality (CC) [33] . Closeness centrality measures how close a vertex is 
to all other vertices in the graph. More precisely CC at node i is the inverse of the average 
distance to all other nodes. Nodes are ranked according to their CC from the highest to 
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the lowest score, and removed accordingly. A property of CC is that it tends to give high 
scores to individnals who are near the center of local clnsters (i.e. network commnnities), 
and hence it over-allocates spreaders (or immnnized nodes) next to each other. Moreover, 
it comes with a high compntational cost that prevents the application to large networks. 

Methods Section VI compares resnlts with betweenness centrality |35] and equal- 


graph-partitioning [TT] which present prohibitive rnnning times for the large-scale net¬ 
works nsed here and present worst performance than other henristics. 


II. COLLECTIVE THEORY OF OPTIMAL INFLUENCE 

Nodes forming complex networks play different roles, depending on the process in which 
they participate [5]. Their inherent strength and weakness emerge collectively from the 
pattern of interactions with the other components. Nonetheless, it is a common practice 
to qnantify node’s importance in a network 018], for example social rank order [32|, by 
individnal node’s attribntes snch as the amonnt of its connections la El CO], betweenness 
and eigenvector centralities 0, or its closeness to the core [12] • This attitnde is, nowadays, 
amplihed by the angmented reality provided by virtnal social networks. This idea has also 
permeated to other helds and it is common to strategies of immnnization mm, viral 
spreading of information and marketing in social media mm, as well as targeted attack 
schemes to infrastrnctnre networks laiio]. However, individnal node ranking is an ambignons 
dehnition of node’s inflnence, for it considers the inflnencers as isolated entities and not in 
interaction with each other. Yet, there has been an abnndant prodnction on the snbject 
of identifying most inflnential nodes and ” snperspreaders” nsing snch rankings 0 Cl El- 
[1211191 ED]. The main problem is that all these methods do not optimize an objective 
global fnnction of inflnence. Instead, they are based on assnmptions abont the importance 
of individnal properties of the node [2Q] and, inevitably, they fail to take into acconnt the 
collective inflnence of the whole set of nodes. As a conseqnence, there is no gnarantee of 
their performance. 

On the other hand, a theoretical framework taking into acconnt a global maximization 
of inflnence was ontlined by Kempe et al. 0 in the form of a discrete global optimization 
problem for diffnsion of information models snch as the Linear Threshold Model (LTM) 
of Granovetter ini and other variants [18], whose solntion is proved to be NP-hard and is 
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approximated by a greedy algorithm [1] . This approach makes leverage on a special attribute 
of the function of influence to be optimized, called submodularity, which express the decrease 
in the gain in the output of the process after an increment of the input factors. This 
’’diminishing return” property is often lost in many optimization problems. In particular, 
submodularity does not apply, in general, to the giant connected component in optimal 
percolation problem, which is the function we minimize. It should be said that for some 
LTMs, of the type treated in |Tj, the objective function of influence to be maximized can 
be proven to be submodular. However, this does not hold true in general for other classes 
of LTMs, for example in the case of a hxed choice of the thresholds, as explicitly stated 
in |1], and which represents the type of problem studied in this paper. As a consequence 
even the greedy algorithm does not provide a stable approximation to the optimal solution. 
Furthermore, greedy searches are not scalable and therefore not applicable to current day 
big data in social media and population immunization problems. In our case we face these 
difficulties. 

Another pioneering approach to the problem of influence optimization and immunization 
is outlined in Refs. [I3l [H]. In particular, in Ref. |ll], the authors use a very interest¬ 
ing and principled method, based on Belief Propagation (BP), to minimize the expected 
infection outbreak in an epidemic process (modeled as susceptible-infected-recovered, SIR, 
or susceptible-infected-susceptible, SIS) for a given number of immunized nodes, and for 
a given choice of the parameters of the model, i.e., transmission probability w and initial 
fraction of infected individuals, p. While the method is able to hnd nearly optimal solutions 
to the problem, it becomes unfeasible when p —)• 0, which corresponds to the optimal influ¬ 
ence problem treated here, because the time complexity of the algorithm diverges as p~^ for 


p —0. A full comparison between our theory and BP is performed in Methods Section VII 


From the theoretical standpoint, our main result is the discovery of a method to map 
the problem of optimal influence onto the computation of the minimal set of nodes that 
minimizes the largest eigenvalue of the Non-Backtracking matrix of the network. This 
operator has recently received a lot of attention thanks to its high performance in the 
problem of community detection |25l [26]. We show its formidable topological power in the 
problem of optimal influence. The problem we set up is, in its most general formulation, 
intractably hard. We present a perturbative solution along with a very fast algorithm, that 
we use to simulate an optimized immunization/quarantine and superspreading protocol on 
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very large real networks. The naive strategy, corresponding to the lowest approximation, 
is given by the attack on the high degree nodes. The hrst non-trivial attack strategy is 
equivalent to hnd the ground state of a spin-glass like system, i.e., an anti-ferromagnetic Ising 
model with random bonds in a random external held at hxed magnetization. Higher order 
approximations produce superior performance compared to previous heuristic strategies. 
Furthermore, the algorithm is highly scalable with running time 0(iVlogiV). 


A. Optimal Percolation 

A network is a set of N nodes tied together by M edges. The vector n = (ni, ... ,1%^) 
represents which node is present and which one is removed. We adopt the convention that 
n* = 1 if node i is present, and n* = 0 if node i is removed (corresponding to an inhuencer). 
The total fraction of removed nodes is denoted with q: 

1 ^ 

g = 1 - — = 1 - (n) . (6) 

' i=l 

We call G{q) the fraction of occupied sites belonging to the giant (largest) connected 
component (in the limit iV —)■ cx) represents the probability of existence of the giant compo¬ 
nent). The optimal percolation problem is hnding the minimum fraction qc of nodes to be 
removed such that G{q^ = 0: 

qc = min{g G [0,1] : G{q) = 0} . (7) 

For q > qc, the network consists of a collection of clusters of nodes whose sizes are 
subextensive. Alternatively, for a hxed fraction q < q^ we search for the conhguration of 
removed nodes that provides the minimal non-zero giant connected component. 

A given node i can be disconnected from the giant component G, either because it is 
directly removed, or because it is indirectly detached by the removal of other nodes. In the 
former case n* = 0, while in the second one rii = 1. Thus, we see that n* cannot tell us 
whether node i belongs to G or not. We then need another variable encoding the information 
that node i belongs or not to G. This variable is the probability of node i to belong to the 
giant connected component, z/j, and we agree to set z/j = 1 if i G G, and z/j = 0 otherwise. 
The fraction of nodes in the giant component, upon removing q of them from the network. 
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is then given by: 


G{q) = ^ 

i=l 

In principle, optimal percolation minimizes the giant component over the conhgurations 
n. However an explicit functional form of G'(n) is not feasible. Our approach is to transform 
the problem into the minimization over n of the largest eigenvalue controlling the stability 
of the percolation solution, which can be written explicitly in terms of n. We hrst derive 
the relation between the vector 1 /= {ui,, z/^v) and the vector n = (rii,..., ujv). This can 
be easily done using a message passing approach [221 [231 ES] • 

Let us consider two connected nodes i and j and orient the corresponding edge from i to 
j. Now let us suppose to ’’virtually” remove j (create a “cavity” at j) from the network and 
ask ourselves if node i belongs to G or not. This information can be stored in an auxiliary 
quantity = 1,0 representing the probability of i to belong to the giant connected 
component in the absence of j. The advantage of using the variables instead of Ui 

is the fact that they satisfy a closed set of equations. Clearly = 0 if = 0. So the 
interesting case is when n* = 1. Recalling that j is momentarily absent from the network, 
the chance that i belongs to G is determined by the event ”at least one among the neighbours 
of i different from j, belongs to G when i itself is virtually removed from the network”. For 
a locally tree-like network this statement can be mathematically translated in the following 
message passing formula [221123]: 




i^j 


rii 


Rd 

k^di\j 




(9) 


where di \ j is set of nearest neighbours of i minus j. We can hnally put back j in the 
network and get the real information Ui as: 


Ui = Hi 


(1 ^k—>-i') 


k£di 


( 10 ) 


The system dehned by Eq. (|^ always admits the solution {ui^j = 0} for all i ^ j 
(regardless of the values of Uj), as can be verihed by inspection. 

As a consequence also {z/j = 0} for all i, which in turn gives G = 0. This solution 
is stable, provided that the largest eigenvalue of the linear operator represented by the 
2M X 2M matrix dehned on the directed links k ^ ^ j'- 
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M 


dui. 






. ( 11 ) 

oi'k^e {i'i^j=o} 

is less than one. We call A(n;g) the largest eigenvalue of M., which depends on the vector 
n and we add also a parametric dependence on the fraction of removed nodes q. Thus, the 
stability of a solution set n of G = 0 is determined by the condition A(n; g) < 1. 

For a hxed fraction q there exist, in general, very many possible conhgurations n that 
satisfy Eq. (@. When q < qc each conhguration n gives A(n;g) > 1, since it is impossible 
to hnd a set of nodes to remove such that G{q) = 0, and this corresponds to the instability 
of the solution {Pi^j = 0} signaled by a value of A(n; q) larger than one. On the contrary, 
when q > qc, we have two different possibilities: there exist conhgurations n such that 
A(n;g) > 1, which corresponds to nonoptimal node removals unable to destroy the giant 
component {G{q) > 0); on the other hand, there can be found other conhgurations for which 
A(n;g) < 1, which corresponds to a fragmented network with G{q) = 0. As we approach 
qc from above, q —>■ g+, the number of conhgurations n such that A(n;g) < 1 (and hence 
G{q) = 0) decreases and eventually vanishes at qc- This situation is exemplihed in Fig. [^. 

We hnd that the matrix jii is given in terms of the Non-Backtracking (NB) matrix B 
[ISIEI] for a locally-tree like random network via the equation: 


M 




= nS. 




( 12 ) 


where 


B 




{ lii ^ = i and j ^ k , 
0 otherwise. 


(13) 


The modihed NB operator Ji4 is represented by a 2M x 2M matrix on the 2M directed 

edges of the network. For example, for the simple following graph with N = 6 and M = 6: 

1 



(14) 
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the corresponding M matrix is a 12 x 12 matrix that reads (the associated non-backtracking 
matrix jB is obtained by setting rij = 1): 


1^2 2^1 2^3 2^5 3^2 3^4 3^5 4^3 5^2 5^3 5^6 6^5 


1 ^ 2 
2 ^ 1 
2^3 
2^5 
3^2 
3^4 
3^5 
4^3 
5^2 
5^3 
5^6 
6^5 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


0 

0 

0 

0 

n2 

0 

0 

0 

n2 

0 

0 

0 


n2 

0 

0 

0 

0 

0 

0 

0 

^2 

0 

0 

0 


n2 

0 

0 

0 

n2 

0 

0 

0 

0 

0 

0 

0 


0 

0 

0 

0 

0 

0 

0 

ns 

0 

ns 

0 

0 


0 

0 

ns 

0 

0 

0 

0 

0 

0 

ns 

0 
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0 

0 

ns 
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ns 

0 

0 

0 
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0 

0 
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0 

0 

ns 

0 

0 

0 

0 

0 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

(15) 


The largest eigenvalue of the NB matrix B is positive and simple, as a consequence of the 
Perron-Frobenius theorem [2l], and the corresponding eigenvector is such that all compo¬ 
nents are positive. 

The NB matrix B has recently received a lot of attention in context of detectability 
of communities in complex networks [25l [26]. In that problem the interesting eigenvalue 
is the second largest, since the corresponding eigenvector can be used to label the nodes 
in different communities. This can be easily understood for the case of two communities. 
Since the eigenvector corresponding to the second largest eigenvalue has both positive and 
negative components, one can assign to one community all the nodes corresponding to the 
positive entries of the eigenvector, and to the other community all the nodes labeled with 
the negative components (the eigenvector corresponding to the largest eigenvalue does not 
work to dehne communities since it has all positive components, so that it is insensitive to 
the community structure of the network). This clustering method can be also generalized to 
the case of multiple (i.e. more than two) communities. This kind of community detection 
protocol can be implemented also using other matrices, e.g. the adjacency and the Laplacian 
matrices. What is crucial is the fact that the NB operator has the optimal performance, in 
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the sense that it is able to detect communities down to the detectability threshold, while 
the other spectral methods fail much before. 

The NB matrix B also intervenes when one linearizes the belief propagation equations, 
and it was hrst used to detect the location of the phase transition in Ref. ng for the 
3-coloring problem. 

In our problem we are interested in the largest eigenvalue of the matrix jii (not of 
B), which is indeed a suitable modihcation of the NB matrix via optimal removal of n*. 
That is, is a NB matrix on a modihed network where some nodes are removed in an 
optimal way (uj = 0). According to the Perron-Frobenius theorem, the largest eigenvalue 
of the matrix is a strictly decreasing function of B, that is if Ad < (meaning that the 
inequality holds entry by entry of the matrices) and Xi B, then A (Ad) < \{B). In our 
case the matrix Ad can be obtained from the matrix B by setting one or some of the n, = 0. 
Therefore the optimization problem for a given q can be rephrased as hnding the optimal 
influencer conhguration n* that minimize A(n; q) over all possible conhgurations n satisfying 
(n) = 1 — q. Calling A(n*; q) this minimum, we write: 

A(n*;g) = min A(n;g). (16) 

n:{n)=l—q 

The optimal threshold qc is the solution of the equation: 


A(n*;?c) = 1. 


(17) 


Still, this equation is hard to solve, since there is no explicit formula for A(n;g) as 
a function of n. To tackle this problem we propose a sequence of approximations to the 
largest eigenvalue (and associated eigenvector) which is based on the Power Method iterative 
scheme that we hnd quickly convergent to the exact solution of the problem. We stress that 
the optimal exact solution for i ^ oo holds only under the assumption that the graph under 
consideration is locally-tree like. 


Before to conclude this section, we notice that for the network depicted in (14), the 


modihed NB matrix does not depend on rii, 77 , 4 ,ne, i.e., it does not depend on the variables 
outside the loop. As a consequence, its largest eigenvalue does not depend on those variables 
as well: A = A(n 2 , 713 , ns). Therefore, removing the nodes at the end of the dangling edge 
does not reduce the eigenvalue A, which is one for the considered network: A(l, 1,1) = 1. 
On the contrary, by removing a node belonging to the loop, e.g. nodes 2, 3 or 5, the network 
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becomes a tree, and the eigenvalue is zero: A(0,1,1) = A(1,0,1) = A(1,1,0) = 0. In 
general, the largest eigenvalue of a tree-network is equal to zero. For networks with one 
loop (unicyclic graph), the largest eigenvalue is equal to one, and for networks with many 
loops A is larger than 1. Thus we see that the result of minimizing the largest eigenvalue of 
the modihed NB matrix is a way to attack the loops in the network. When the eigenvalue 
reaches the critical value one (A = 1) the network consists of a single loop, that is suddenly 
destroyed by the removal of a single node, causing the sharp drop of the eigenvalue from one 
to zero. When the giant component is reduced to a tree-like topology, it can be considered 
as completely fragmented, since any tree can be destroyed with a subextensive number of 
node removals. Thus, our theory suggests that the best attack strategy is to destroy the 
loops. This process is illustrated in Fig. [^. 


B. Mapping of optimal immunization and spreading problems onto optimal per¬ 
colation 


Here we map exactly the problems of optimal immunization and spreading to the problem 
of minimizing the giant component of a network, ie, optimal percolation. 

In the immunization case, the quantity z/j represents the probability of node i to be 
infected. Therefore minimizing the sum 'Yhi is equivalent to minimize the size of the 
disease outbreak, which is obtained by optimally choosing the immunizator nodes. These 
immunizators are exactly the nodes we need to remove to fragment the network. Precisely, 
n, = 0 if node i is an optimal immunizator, and rzj = 1 if not. Note that we can also 
slightly modify the equations to include the case of a transmission probability of the disease 
w smaller than one (the case treated explicitly in our paper corresponds to w = 1). Including 
tc, the main equation reads: 




Ui 


(1 - WUk^i) 

kGdi\j 


( 18 ) 


In this case the optimal immunizators can be still identihed by minimizing the largest eigen¬ 
value of the modihed NB operator A4. The only difference is that the critical threshold qc 
is dehned by the following equation: 


A(n*;gc) = 1/w- 


(19) 
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It should be noted, though, that the quantity N ^ i/j, which quantihes the total fraction 
of infected individuals, is different from the giant component G when tc < 1. 

The case of optimal spreaders is the dual of the optimal immunizators. The optimized 
spreading problem |1] consists of hnding the minimum number of nodes to ’’activate” in 
such a way that the information percolates the network following, for instance, the Linear 
Threshold Model (LTM) dynamics [T71IT8] . The LTM simulates a spreading of information 
process where an individual adopts an opinion or information under “peer pressure”, that 
is, it becomes activated only after a given number of its neighbors are [nillHlEZlEH]. The 
optimal influence threshold represents the minimum fraction of spreader nodes we need to 
activate to spread the information all over the network. The mathematical formulation of 
the problem is the following. Let us dehne Ui^j as the probability that, in the information 
spreading process, node i is eventually NOT activated in absence of node j. Moreover, we 
assign to each node the number Uj, which, in this case, equals zero, Uj = 0, if i is an initial 
spreader, and n* = 1 if not. We note that in the case of immunization, Uj = 0 corresponds 
to an immunized node. Following the Linear Threshold Model (LTM) [T7] of information 
spreading, in order for node i to be activated in absence of node j, the neighbouring nodes 
such that k G di\j must be activated. In the Linear Threshold Model [IZIIIB], this situation 
corresponds to considering uniform weights Wij = 1 for each edge cind threshold of 

activation 9i = ki — 1 for each node with degree ki, such that a node i becomes activated 
if the number of active neighbors is at least 6i. The optimal spreading problem under the 
LTM is to hnd the minimum set of initially activated spreaders, qc, which will percolate 
the information to the entire network, as dehned in Kempe et al. [1]. On the other limit, 
the optimal immunizator problem corresponds to setting 9i = 1, further showing the dual 
relation between the immunization and spreading problem. 

The probabilities for spreading can be computed self-consistently through the fol¬ 
lowing message passing equations [221 [231133] in analogy to Eq. 




Hi 


(1 ) 

kGdi\j 


( 20 ) 


The total probability z/j that i is not activated is obtained from Eq. 
the contribution of Uj^i as well: 


(20) by including 
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(21) 


Vi = Hi 


JJ(1 - Vk^i) 


kGdi 


and the total fraction of nodes not activated in the spreading process is: 


N 




( 22 ) 


i=l 


The optimization problem in spreading nnder the LTM consists in minimizing the fraction 
of inactive nodes G, or eqnivalently, maximizing the spreading over the network. We notice 
the dnal natnre of G in immnnization and spreading. In the former, G represents the 
fraction of inactive nodes, which we want to minimize to maximize spreading. In the later, 
G represents the connected component of individuals who would get infected if the epidemic 
starts in a single node. In this case, we also want to minimize G to reduce the spread of 
the epidemic. This is the basic reason why we are able to bring the two problems under the 
same framework of optimal percolation. 

In activated spreading, the minimization of G is achieved by optimally placing the initial 


spreaders rij = 0. Equations (20) and (21) are formally identical to the equations of optimal 
influence (^, (10). To summarize, in spreading, the giant component G represents the 


fraction of inactivated nodes. Therefore, by minimizing the giant component in spreading 
we are effectively maximizing the spreading of information. This completes the mapping 
between the immunization and spreading problem to optimal percolation. 

We notice that the mapping of the optimal LTM problem to optimal percolation is done 
for a threshold 6i = /cj — 1. For = 1, we recover the optimal immunization problem. These 
two problems can be solved by studying the local stability of the solution G = 0 in both 
cases. This is possible since at qc the transition is of second order, and thus a local stability 
theory is applicable. For other intermediate values of the threshold 6i = ki — 2, ki — 3,.. .2, 
the transition at Qc is of first order, of the type of bootstrap percolation [39]. In these 
regimes, a local stability criterion cannot be applied, and the optimal spreading problem in 
these cases needs to be considered independently. That is, the largest eigenvalue of the NB 
operator is not guaranteed to provide the optimal set when the transition is of first order. 
Nevertheless, we could expect that the proposed Cl algorithm may work as well in this 
regime of discontinuous transition. Another interesting generalization of the present study 
is the case of randomly chosen heterogeneous threshold 9i in the LTM [18] or a hx threshold 
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A further interesting optimization problem is to find the optimal spreaders under the SIR 
or SIS models [12] . In general, this problem cannot be translated into an optimal percolation 
problem, because it does not have a transition from G = 0 to G > 0. Therefore, it cannot 
be treated under our stability theory. However, as for the LTM with intermediate values 
of the threshold, the spreaders identihed by the Cl algorithm could still be expected to be 
close to optimal. 

To conclude this section, we notice that for all optimization problems on locally tree-like 
random networks that can be mapped onto an optimal percolation problem, with a second 
order transition separating the phases with G = 0 and G > 0, our theory holds true, and 
the Cl algorithm can be used accordingly. 

In the next sections, we show that the optimal percolation set is obtained as a (inhnite) 
sequence of optimized “attacks” expressed as successive approximations to the minimization 
of the largest eigenvalue of the modihed non-backtracking matrix. At the most trivial level, 
we obtain the random attack of the network corresponding to random percolation, Eq. 

The zero-order naive strategy corresponding to the lowest approximation of the theory is 


given by the attack on the high degree nodes, Eq. STO The hrst non-trivial collective 
attack strategy is equivalent to hnd the ground state of a two-body spin-glass like system, 
i.e., an anti-ferromagnetic Ising model with random bonds in a random external held at 
hxed magnetization, Eq. ^78| Higher-order approximations consist of increasing many- 
body problems, Eq. ^5^ and produce increasingly better performance compared to previous 
heuristic strategies. 


C. Limit of applicability of the theory of influence 

The mapping of the optimal influence problem onto the optimal percolation problem 
developed so far is strictly valid for locally tree-like networks: in the message passing for¬ 
mulation Eq. the probabilities are assumed to be independent. This includes the 
thermodynamic limit of the class of random networks of Erdos-Renyi, scale-free networks 
[5] and the configuration model (the maximally random graphs with a given distribution 
mi) which are locally tree-like and contains loops that grow as log N BH. Nonetheless, it is 
generally accepted, and confirmed by many works, that results obtained for tree-like graphs 
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apply quite well also for loopy networks, provided the density of loops is not excessively 
large [HI |2Sl EH] • Wherever the number of such topological structures (loops) is abundant 
(e.g., to hnite dimensional lattices), the quality of the results obtained for tree-like networks 
deteriorates. Indeed, the locally tree-like approximation has been successfully used for a 
plethora of other problems, like spin glass models on random graphs, coloring, matching, 
bisection and maximum cut of graphs, and many other constraint satisfaction problems [36] . 


D. Random influence 


The trivial case corresponds to random removal of nodes (random percolation). It is 
obtained by taking the rii at random from the distribution P(n;g) such that the removal is 
decoupled from the non-backtracking matrix: 


N 


P(^\q) = 11(1 - 9 )”’'''' 


( 23 ) 


2 = 1 


Taking the expectation of the matrix M. over n, and exploiting the fact that P(n; q) is 


factorized over the sites, Eq. (12) becomes: 


lEn (1 


( 24 ) 


Therefore, the eigenvalue A(n; g), averaged over n, is given by: 


\(q) = (1 - q)\e, ( 25 ) 

where Ag is the largest eigenvalue of the NB matrix B for a random network. It is well 
known that the largest eigenvalue of the NB matrix is equal to ESI: 

Ag = K — 1, (26) 

with K equals to the ratio of the hrst two moments of the degree distribution: 

K = {e)/{k). ( 27 ) 

The condition X{qc) = 1 is nothing but the famous result for random percolation [21] which 
has been obtained using the NB matrix in [23l EZ], i-e.: 




(28) 
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Thus, when n is decoupled with the NB matrix, the largest eigenvalue of NB captures 
the random percolation threshold for random networks. This result has been previously 
obtained in [23] using similar ideas as used in the present derivation. The largest eigenvalue 
of the NB matrix in case of random removal of nodes is true only when the original graph 
is random. For a generic graph, the largest eigenvalue can be very different and not related 
to the first and second moment of the degree distribution. On the other hand, the optimal 
threshold arises by coupling the removal of nodes with the NB matrix, a case that is treated 
next. 


E. Derivation of the main formula: cost energy function of influence, Eq. (Q 


Next, we derive Eq. Q which holds only on very large locally tree-like graphs. From 
now on we omit q in A(n;g) = A(n), which is always kept hxed. For a given configuration 
n, the eigenvalue A(n) determines the growth rate of an arbitrary nonzero vector wq after 
i iterations of the matrix A4, provided that wq has nonzero projection onto the eigenvector 
corresponding to A(n). Denoting with W£(n) the vector at the £-th iteration, 

W£(n) = jCi^Wo, (29) 


we can write according to Power Method m 


A(n) = lim 


|w^(n)| 


e^oo [ |wo| 


l/£ 


where 


|w^(n)p = (w£(n)|w£(n)) = (wo|(2W^)^Af^|wo). 
For finite i we define the f'-dependent approximant A£(n) as: 


A/(n) = 


|Wo| 


so that we have: 


A(n) = lim A£(n). 


i^oo 


We now derive the analytical expression of Af(n). 

We start by computing the first approximant £ = 1: 


|w,(n)) = V(|wo). 


(30) 

(31) 


(32) 


(33) 


(34) 
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In order to do this, it is convenient to embed the matrix A4, whose dimension is 2M x 2M, 
in a larger space of dimension N x N x N x N. In this enlarged space Ai is given by: 

where each index runs from 1 to A^: i,j,k,i=l,...,N, and Aij is the adjacency matrix. 
Practically, we have represented Ai on the nodes of the network, rather than on the directed 
edges. The Kronecker deltas guarantee the non-backtracking nature of the NB walks. 

As starting 2M-dimensional vector |wo) = |1) in the space of links, we choose the vector 
with all components equal one; the optimal solution is independent of this selection. The 
components of the analogous N x N vector, |wo) in the enlarged space of nodes are given 
by \wo)ij = Aij. 

The right vector |wi(n)) is computed as: 

|wi(n))y = '^Mijke \wo)ke = njAij{kj - 1) , (36) 

k£ 

while the left vector (wi(n)| is given by: 

ij(wi(n)| = ^ ki{wo\Mk£ij = niAij{ki - 1) . (37) 

ki 

The factor ki — 1 will appear frequently in the following, so it is worth to set the residual 
degree: 

Zj =ki-l . (38) 

The norm |wi(n)p is: 


|wi(n)p ij{wi{n)\wi{n))ij = '^AijZiZjninj. 


(39) 






Since the norm of |wo) is simply: |woP = Xli — 2M, we can write the hnal expression 


for Ai(n) from Eq. (32) as: 


Ai(n) = 


2M 


AijZiZjniUj 


V 


1/2 


(40) 


It is useful to give a graphical representation of the previous formula in terms of a dia¬ 
grammatic expansion, which becomes indispensable for higher orders of the iteration. The 
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interaction term in the sum in the numerator on the r.h.s of Eq. (40) can be represented 


as 


Aij{ki - l){kj - l)ninj = 


(41) 


The meaning of the diagram is the following: each time a straight line connects two sites 
i and j, the variables rij and rij are multiplied by each other (the meaning of the arrow is 


explained later in Methods Section IIF in terms of NB walks; for the moment it can be 
thought of as an undirected line). The wiggly lines on the nodes means a multiplication 
by the factor Zi = ki — 1. The diagram is then equal to niUjZiZj. The number of variables 
appearing in the diagram gives the order of the interaction. In this case, corresponds to 
a pair-wise interaction. Thus, the hrst optimization order ^ = 1 corresponds to a 2-body 
problem. We will see that, in general, the f'-order term in the approximant describes a 
2£-body problem. 

Let us compute for £ = 2, |w 2 (n)); 


|w2(n))ij = |wi(n))M = njAijy^ AjenaZjjl - 5*^), 

k£ £ 

and also (w2(n)|; 

ij{w2{n)\ = ^ k£{wi{n)\Mk£ij = UiAij'^AikUkZkil - 6kj) . 
k£ k 

The norm |w2(n)p is given by: 

|w2(n)p = ^ AijAjkAuil - Sik){l - 5j()ziZt,ninjnkni. 
ijki 


(42) 


(43) 


(44) 


There are two types of interactions in the sum on the r.h.s of Eq. (44): a 4-body and a 
3-body interaction. The graphical representation of the former is: 


AijAjkAk£ZiZiTliTljTlk'kl£ 



Z£ 

Tie 


(45) 


The diagram for the 3-body interaction is: 


AijAjkAkii^Zi^ TliTljTlk 
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Then, the i = 2 approximant of the eigenvalue, A 2 (n), is given by: 


Aofn') = 


2M 




ijk£ 


1/4 


(47) 


In the next order i = 3, |w 3 (n)p there appears a term with 6-body interactions: 


= i . i 


terms with 5-body interactions: 



/O 

terms with 4-body interactions: 



and a term with 3-body interaction: 

J ^ 


(48) 


(49) 


We see that the series expansion of the maximum eigenvalue can be written in terms of 
a systematic diagrammatic expansion of increasing levels of many-body interactions. The 
generalization is treated next. 
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F. Interpretation in terms of NB walks and generalization to 2£-body interactions 


Nodes entering in each type of interaction given by |wi(n)p, |w 2 (n)p, and |w 3 (n)p are 
the same nodes visited by a non-backtracking walk of length 1,3,5, respectively (with the 
possibility of traversing an edge multiple times). In general, the diagrammatic expansion of 
the term |wf(n)p will then contain all the possible graphs that can be built using the nodes 
traversed by a NB-walk of length 2^ — 1. This is the reason why we put an arrow in the 
diagrammatic schematization of the interaction terms: emphasizing the connection with NB 
walks. 

The NB graphs in the expansion are built in the following way: (a) For a given £ we 
construct all NB walks of 2£ — 1 steps starting at one node i (with degree fcj) and ending 
in node j (with degree kj). The initial and ending point are indicated by a wiggly line 
indicating their degree minus one. (b) The initial and hnal node of the NB walk do not need 
to be necessarily different, (c) Loops are allowed in the NB walk. Thus, the shortest path 
between i and j might be smaller than the 2£ — 1 steps of the walk, (d) We recall that the 
condition for a NB random walk is only that it cannot come back through the same link 
that it just came on, yet, it can visit the same node several times, (e) As well, the NB walks 
are allowed to travel through the same links multiple times, (f) The number of nodes of the 
NB walk is the order of the interaction, (g) The dominant graph is always a direct path 
(line) of length 2£ —1 and 2£ nodes, where the shortest path between the initial and ending 
node is 2^ — 1; we will see that all the other diagrams with loops are negligible for sparse 
locally tree-like random graphs. 

For instance, in the case £ = 2, we obtain two graphs representing NB walks of length 


2^ — 1 = 3. The graph (45) represents a NB walk of length 3 which traverses 4 distinct 


nodes, and thus it is a 4-body dominant interaction. The graph (46) also represents a NB 
walk of length 3, but this time the NB walk starts and ends in the same node i, rule (b) and 
(c), so this term contributes with a factor zf. 


In the case £ = 3, the leading interaction is graph (49) of 6-body interactions and 5 NB 


steps. The 3-body interaction (49) is a NB walk of 5 steps that starts at the node i on the 
right and traverses two links twice, rule (e), to end up in node j, resulting in a triangular 
3-body problem. 

Next, we show that, in a sparse random network, all the NB graphs with loops can be 
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neglected to 0{1/N), in accordance with the assumption of locally-tree like structure. Thus, 
the main equation Q takes into account only the leading 2£-body interaction for each £-level 
in the diagrammatic expansion to 0(l/iV). This situation is translated to the Cl algorithm 
where the ball is dehned by a radius £ dehned as the shortest path between two nodes in 
the network. 

It is important to note that, for locally tree-like graphs and for very large system sizes 
iV, the terms containing one or more loops are suppressed by powers of 1/N. This can be 
understood through the following argument. Let us consider, for the sake of simplicity, an 
ER random graph, where each edge is present with probability z/N, where ^ is the average 
degree (in ER the average degree is the same as the average residual degree). The total 
number of loops in ER of given length ^ is given on average by (the hnal formula is valid 
for general sparse random graphs BO], the quantity ^ being in general the average residual 
degree): 

where the factor l/2^ comes from the fact that: i) anyone of the ^ nodes in the loop can 
be taken as starting point; ii) the loop can be traveled in two directions. Therefore, the 
probability for a node to belong to a loop of length £ is: 


p(£) 


m 

N 


2iN ’ 


(51) 


and thus it is of order 0(l/iV). The quantity A/'(£) enumerates the number of non self- 
intersecting closed walks. Self-intersecting closed walks have a probability of order 0(l/iV^), 
since the self-intersection is encountered, on average, in a fraction 1/A^^ of the total number 
of nodes. Thus, loops with many self-intersections are suppressed by higher power of 1/N. 
This argument can be generalized to random graph ensembles other than ER, provided they 
have a locally tree-like structure in the limit iV —)■ oo. 

As a consequence, for very large network, the leading term of |w£(n)p for a given i (i.e.. 


the one containing a number of nodes exactly equal to 2i, for instance the diagram (48) 
in the expression of |w 3 (n)p) is already a good approximation, which becomes exact for 
N ^ oo. However, for small networks all the terms should be considered. In Methods 


Section IV we will minimize the eigenvalue A£(n) by taking into account all the possible 
interactions. 
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Since, in the limit N —>■ oo, loops do not contribute to the assessment of the norm |w£(n)|, 
the equation for |w£(n)| simplihes considerably, because we can take into account only the 
terms with exactly 2£-body interactions; the remaining ones with loops decay as 1/N or 
faster. 

The analytical expression for |wi(n)p and |w 2 (n)p given by Eqs. (39), (44) can be 
generalized to any |w£(n)p. We get 




. n. 


*2« ■ 


lll2...*2^ 


(52) 


Each term of the sum in Eq. (52) can be associated to a non backtracking walk of length 
2i — 1, where edges may be crossed multiple times. To each node visited by the NB walk is 
attached a variable rii, and the extreme nodes of the walk have the extra factors Zi-^ (starting 


point) and (end point). When NB walks containing loops are neglected in Eq. (52), the 
formula simplihes considerably, since the only remaining walks are the ones of length 2^—l 
where each one of the 2^ nodes is visited only once. For example, let us consider a loop free 
NB walk of length 2£ — 1 starting from a given node, say node i. It visits all nodes up to a 
distance 2^ — 1, and stops at the hnal node, say node j. The total number of nodes visited 
by this NB walk is 2£. 

Notice that, when loops are neglected, we can approximate the local environment around 
any node by a tree, in line with the original locally-tree like assumption of the whole ap¬ 
proach. A simple way to implement the tree-like approximation is to consider only the NB 
walks of length 2^—1 that start from a given node i and end on nodes j, in such a way that 
these NB walks coincide with the shortest paths between i and those nodes j. Moreover, 
these NB walks contain the products ... rii^^ with all different from each other. 

Hence, we can hnally write down the expression for the leading term of |w^(n)p as: 


N 




n 


Uk 


Oj , 


(53) 


*=1 je5Ball(i,2£—1) \k&'P 2 l-i(i,j) 


where Ball(q£) is the set of nodes inside a ball of radius £ around node i, where the radius 
is dehned taking the shortest path as the distance, 9Ball(i,f') is the frontier of the ball and 
Vi{i,j) is the set of nodes belonging to the shortest path of length £ connecting i and j. 
This is the cost energy function of influence Eq. (|^ given in the main text. 
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G. Odd (2£ + l)-body interactions 


The energy (or cost) function Eq. (53) contains only even 2t'-body interactions. It is 


possible to interpolate with odd interactions by considering an analogous Power Method 
expansion of the eigenvalue in terms of the matrix elements (w£(n)|A^|wf(n)). Indeed, the 
eigenvalue A(n) can be also computed using another series expansion in the Power Method 
as: 


A(n) = lim 


(wf(n)|A^|w£(n)) 


1 l/{2i+l) 


(54) 


(wo|wo) 

The explicit expression of (w£(n)|A^|w£(n)) can be computed similarly to |w£(n)p, following 
the same steps outlined in Methods Section HE As an example, for (wi(n)|Ad|wi(n)) we 
get: 

{^l{n)\M\^Nl{n)) 5ik)ziZkninjnk • (55) 

ijk 


The asymptotic expression for N ^ oo (i.e. the one neglecting loops) is similar to Eq. (53), 
and, for £ > 1, reads: 


N 


(w^(n)|A4|w£(n)) = E n Uk Zj , 


(56) 


i=l j&dBall{i,2e) \k&V 2 t{i,i) 


while for £ = 0 we find: 


N 


{wo|A^|wo) ^ ^ ki[ki - l)n, 


(57) 


2=1 


We can introduce the equivalent of the approximant Af(n) in Eq. (32), that we call A£(n): 

l/{2£+l) 


A'An) = 


(w£(n)|Wl|w£(n)) 

(wo|wo) 


(58) 


H. HD attack at £ = 0 onc-body problem 

It is interesting to consider the i = 0 term, Ao(n), since it reproduces exactly the high- 
degree (HD) strategy attacking the hubs calculated by Cohen et al. [lO]. This term repre¬ 
sents the one-body interaction where the influencers are considered in isolation, only affected 
by the external field, and therefore this strategy lacks the collective influence effects found 
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for £ > 2. It reads: 


Ao(n) = 


(wo|7W|wo) ki{ki - l)ni 


(59) 


(wo|wo) 

The sum on the numerator on the r.h.s represents the energy of a gas of free particles in 
an external (site-dependent) field hi = ki{ki — 1). This field is always nonnegative hi > 0. 
Therefore, in order to minimize Ag(n) it is sufficient to sort the nodes according to the 
external field, and then removing the ones corresponding to the highest Nq fields. Since 
the field hi is monotonic increasing with the degree, the minimization corresponds exactly 
to the removal of the high degree nodes one by one. It is also easy to see that the stability 
condition imposed on Ao(n), 

min A'o(n) = 1, (60) 

n:(n) = l-ijc 

gives exactly the threshold Qc expected from HD as calculated by Cohen et al. m. Indeed, 


putting the expression for AQ(n) in Eq. (60), we have. 


mm 

n:(n)=l-qc 


Yi kijki - l)ni 

Yiki 


= 1. 


(61) 


As we said, the minimization of the numerator in the I.h.s. is achieved by setting n* = 0 
for the first Nq^ highest degree node, and rii = 1 for the remaining ones. Therefore we can 


rewrite Eq. (61) as: 


1 ^ {{ky - {k}') = 1 , ( 62 ) 

where the average (■)' is performed using a modified degree distribution P'{k]qc), which 
depends on qc and represents the degree distribution in the network with the removed hubs. 
To derive the explicit form of P'{k] q^), let us consider first the relation between q^. and the 
original P{k). The fraction q^ of nodes to be removed is 


J2p{k) = q,, (63) 

fc=C 

where ( is the lowest degree of the removed nodes compatible with qc- 

The distribution P'(fc; g^) is the degree distribution of the remaining nodes (i.e. the ones 
for which rij = 1) and is given by: 


P\k]qc) = 


Qc 


-P{k)9iC-k) . 


(64) 


We now solve Eq. (61) in the case of scale free network with degree distribution P{k) ~ 
k~^, degree exponent 7, minimum degree m and maximum degree /Cmax- We also work in 
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the continuum limit, so that we can use integrals in place of the sums. The variable C as a 
function of Qc reads: 


Qc = 


P{k) k dk 


Al -7 _ Ul--y 
S '^max 

7T7I-7 — k ^~^ 

Hi A/max 


(65) 


The average {k)' is given by: 


{k)' = 


P{k) k dk = 


1 7 — 1 7 _ (^2 7 


and by: 


{ky = 


- Qc'-y -2 w}-'^ - /Cmax ’ 

1 7-1 


I - qd - - fcmal 


Putting these expressions in Eq. (62) we hnd the following implicit equation for qc- 

7-2 - ( 3-7 ^ 2-7 _ ^ 2-7 


7-3 m2-7 - J m2-7 - J 


= 1 . 


( 66 ) 


(67) 


( 68 ) 


Equation ( 68 ) simplihes considerably when t 00 . Indeed (for 7 > 2), we hnd the same 

result as in m-- 


1-2 ( 

-- I m- 

7-3 


p2-7 

+ = 2 . 


(69) 


Considering the ensemble where fcmax = mA^ 3 /( 7 -i)^ when N ^ 00 then fcmax —t 00 , the 
relation between ( and gc is C = ci-nd we hnally hnd: 


qi2 7)/{i 7 ) = 2 + -—(gf 
7 — 3 


-7)/(l-7) _ 


1) . 


(70) 


which is the known result for the HD attack m- The solution is shown in Extended Data 

Fig. 


For 7 = 2 , Eq. (70) predicts a zero critical qc for any m, and this is interpreted as an 
extreme fragility of scale-free networks when the degree exponent is close to 2. Indeed, for 
7 = 2 , the network is essentially a star-graph, which is trivially destroyed by removing the 
central hub. This in turn is a consequence of the fact that the natural cut-oh fcmax diverges 
linearly with the system size for 7 = 2 , i.e., fcmax ~ N (see Extended Data Fig. 01 ) • 

Nonetheless the situation changes a lot for other network ensembles, where the cut-off 
can be very large but hnite (which is the case of all networks, both real and synthetic ones). 

and for 7 —)■ 2 gives: 

Ckm-c 


Indeed, Eq. ( 68 ) for a hnite cut-off k^ 

( = m + log 




(71) 
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By expressing ^ as a function of qc via the equation: 


c = 


mkr. 


m + qc{km^^ - m) 
we find the equation for qc as a function of the cut-off 


( 72 ) 


mkr. 


= m + log 


kl. 


_ (" 73 ^ 

m + qcik^a^-m) "" ' + mq^k^i,^ - m) 

The solution to Eq. (73) is shown in Extended Data Fig. For fcmax —t oo, the asymptotic 

behaviour of the threshold qc is given by : 


m 


\og{k^ 


for krr 


—>■ CX) 


( 74 ) 


Therefore, qc still vanishes when fcmax cx) but as the inverse of the logarithm of the cut¬ 
off. This very slow convergence makes questionable the claim that scale free networks are 
extremely fragile under hubs removal. Indeed, as can be seen in Extended Data Fig. [T]d, 
even for /cmax of the order of hundred millions, qc is still of order 0.1. For more realistic 
^max = 10^, which is typical of social networks, qc ~ 0.2 for all 7. In these situations, the 
search for other attack strategies becomes important. 


III. OPTIMIZATION WITH THE CAVITY METHOD 


The eigenvalue A(n) can be minimized by using the cavity method from spin glass theory 
[28] since we work with sparse graphs. The method can be applied in practice to the hrst 
order approximation to the eigenvalue = 1, which is a pair-wise interaction, i.e., Eq. S|40j 


Ai(n) 


wi(n)| 

|wo| 


(75) 


For higher order many-body interactions, the cavity method becomes much more involved 
and we will pursue other solving strategies. 

The analytical Replica Symmetry (RS) solution obtained with the cavity method for this 
pairwise model allows us to compare with the solution given by EO. This is a very useful 
check of the correctness of the problem solution, since both method are mathematically not 
rigorous. An improvement over the RS solution can be obtained by applying the so called 
Istep replica symmetry breaking cavity method (IRSB) [28l|36|, which should be compared 
as well with the EO solution. This will be done in a future work. Here we only note that. 
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as far as the assessment of the ground state energy is concerned, the difference between the 
RS and the IRSB estimations is, typically, very small (for example it is less than 2% for 
spin glass models on random graphs [3S])- We expect that the same scenario holds true also 
for the model defined here. Furthermore, a deeper physical insight can be obtained when 
we recast the problem in terms of spin glass theory as we show next. 


The energy (cost) function of the problem is, Eq. S39 


£:(n) = |wi(n)p = -l)ninj . (76) 

ij 

The physics of this system is made more transparent when the problem is formulated in 
terms of Ising spin variables s, = ±1. The translation of the problem in the language of 
statistical mechanics will turn the optimization problem to one of finding the ground state 
of a spin-glass system. The relation between Sj and Uj is given by 


Si = 2ni-l. (77) 

Note that the state Uj = 0, meaning that node i is removed (influencer), corresponds to 
the spin down state s* = —1. On the other hand the state n* = 1 (node i not removed) 
corresponds to spin up Sj = 1. Using these new variables the energy function takes the more 
familiar form of an Ising model: 

S{s) = -'^SiJijSj - '^HiSi + C, (78) 

(h> * 

where the first sum on the r.h.s is over the pairs {ij) of nearest neighbours sites in the 
network. The coupling constants Jij represent the interactions between the spins and they 
depend on the details of the network. Explicitly they read: 

Jij = -^Aij{ki - l){kj - 1) . (79) 

The local field Hi depends also on the topology and is given by: 

H, = -hi{ki-l){kr-l) , ( 80 ) 

where kf"' is the average nearest neighbors degree of vertex i, dehned as: 

jedi 
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The constant C does not depend on the spin variables and can be ignored in the minimization 
problem (bnt mnst be inclnded in the evalnation of the energy <f^(s)). 

We introdnce also the magnetization, 

m = ^ SijN, (82) 

i 

of the conhgnration s, which is related to the nnmber of removed nodes q by the eqnation 


m = 1 — 2g. 


(83) 


Note that £{m = —1) = 0, and S{m = +1) = {k){K — 1)^ (for nncorrelated networks), where 


The physical system dehned by the energy fnnction (78) is a disordered antiferromagnet 
{Jij < 0) in a random external magnetic held. Remarkably, the disorder in the model, Jij, 
comes from the randomness in the network via the adjacency matrix Aij, even if for a given 
instance of the problem both the conplings and the magnetic held are deterministic (i.e. 
hxed by the topology of the nnderlying network and the degree). 

The problem we have to solve is tantamonnt to hnd the gronnd state of the system with 


energy fnnction (78). More precisely, we want to hnd the gronnd state for a hxed valne 
of the magnetization m, which corresponds to keep hxed the nnmber of removed nodes 
q. This problem represents a qnite novel system for spin glass theory since the conpling 


Jij depends explicitly on the contact network via Eq. £79 This conpling between the 
nnderlying network and the disorder is a qnite nniqne featnre of the optimal percolation 
problem at the ^ = 1 pair-wise level. The problem becomes spin-glass when the system is 
forced to satisfy a given magnetization, i.e., a given fraction of inhnencers, which is a global 
constraint. This additional constraint of constant magnetization represents a problem for 
the cavity method, since it cannot be enforced locally. Nevertheless the problem can be 
circnmvented by introdncing an external held H that is chosen self-consistently to hx the 
desired valne of the magnetization as done in |13]. In practice, we introdnce the Legendre 
transform of the energy fnnction T(s), dehned as 

- MH , 

= -M , 


Eh{s) = 8{s) 
OEh 
dH 


(84) 


where M = Si is the global magnetization. At this point we have to minimize the fnnction 
Eh{s) nnder the global constraint dehned by the eqnation dE/dH = —M. This can be done 
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by properly gauging the external field H during the iteration of the cavity equations, as we 
will explain below. 

The cavity equations for the system defined by the energy function Eh{s) are a set of 
equations for the cavity fields hi^j and the cavity bias Ui^j, one for each directed edge of 
the graph. These variables can be interpreted as messages exchanged by the nodes: the bias 
Ui^j represents the incoming message into node j traveling along the edge connecting i and 
j; the field is the outgoing message from node i towards node j. Outgoing messages are 
computed from the incoming ones in a self-consistent way. The cavity field hi^j quantifies 
the tendency of spin i to be -|-1 or —1, when the spin on node j has been pruned from 
the network (whence the name cavity). The cavity bias Ui^j is determined by optimizing 
between the interaction Jij and the cavity field hi^j. More precisely, the cavity equations 
for cavity bias and cavity fields at zero temperature read: 


hi^j — H{M) + , 

kGdi\j ^ 35 ) 

Uk^i = -sign{Jikhk^i)mm{\Jik\,\hk^i\) . 


Once a solution of the cavity equations (85) has been found, the total local effective field hi 
acting on spin i can be computed through the formula: 


hi = H{M) + W ^'^Uk- 

k£di 


( 86 ) 


The main hypothesis of the cavity method is the existence of a single pure state, which can 


be rephrased as a hypothesis on the uniqueness of the solution of the cavity equations (85). 


This is called the replica-symmetric (RS) cavity method. It is also possible to generalize the 
method to incorporate multiple solutions (the so called cavity method at the level of Istep 
Replica Symmetry Breaking, IRSB [2H])- The analysis of this second method will come in 
a follow-up work. Here we limit ourselves to the RS cavity method. 


The cavity equations (85) can be interpreted as updating rules for a message passing 


algorithm, and therefore they can be solved iteratively starting from a random initial con¬ 


dition. In practice we add a ’’time” label t to the cavity fields and we rewrite Eqs. (85) as 
dynamical message passing equations: 


ft.hj ^H(M)+H‘+ Y, 

kGdi\j 


U 


(t-1) 

k^i 


(87) 
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Since the magnetization M has to be kept fixed, the external field H{M) also needs to be 
updated at each step of the iteration. Therefore, after a total update of the cavity fields 


hi%j, the field H{M) = is recomputed by solving the equation: 


^sign ( + W + 


u't, I = M, 


( 88 ) 


kadi 


The solution of the Eqs. (85) corresponds to the fixed point of the map defined by Eqs. 


(87) and (88). 


Once the solution of the cavity equations has been found, the RS estimate of ground 
state energy is given by: 


- E' 

* ii'j) 


■l] ! 


(89) 


where the site and link energies and eij are given, respectively, by 


= 


max [ H{M) + W + E 




k£di 


kadi / 


(90) 


dij max (^hi^j T J^j T J^j 

hi^j Jij T hj^i^ hi^j -)- tJij hj^i^ . 

From the knowledge of the function we can compute the energy by inverting the 
Legendre transform: 

+ MH{M) , (91) 

where H{M) is the external field which produces the desired value of the magnetization 


M, given by the fixed point of Eq. (88). The value of £^ is all we need to compute the 
optimized eigenvalue. Since |wo| = \/N{k), we finally find: 


Af (m) = 


/gRS(^) 


{k) ’ 


(92) 


where e^®(m) = £ff /N is the (intensive) energy per spin. 


We observed that the message passing equations (87) never converge to a stable fixed 


point. This is a consequence of the existence of very many solutions to the cavity equations 
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(85), which implies that the replica symmetry is broken for this system. Nevertheless, the 
non-converged messages can still be used to estimate the energy In practice we run 

the algorithm for a maximum number of Tmax = 10® iteration. Then we use the current 
value of the cavity fields {K^j} to compute the energy of the system, and we average the 
energy on T^ax niore iterations. 

In Extended Data Fig. we show the optimized eigenvalue as a function of the 
removed nodes q = 1—2m, computed for an Erdos-Renyi network with mean degree {k) = 3.5 
and size N = 10^. We find the transition point at this 2-body interaction approximation to 
be Qc = 0.248. 

The continuous transition from the phase with Af® = 0 to the phase with Af® > 0 
observed in Extended Data Fig. [^is an artifact of the 2-body interaction considered here. 
Indeed the real optimal eigenvalue (i.e. the eigenvalue including cx)-body interactions) has 
to jump from zero to one discontinuously at qc as depicted in Fig. [^. The reason is 
that the largest eigenvalue of the non-backtracking matrix of a tree-graph is zero, while 
the largest eigenvalue for a tree plus one single loop is one. Adding more loops increases 
the eigenvalue. Since there are no other possible networks between a tree and a unicyclic 
graph, the largest eigenvalue of the non-bactracking matrix cannot take values in the interval 
(0,1). As a consequence it has to jump discontinuously from 1 to zero at q^.. In the 2-body 
approximation this jump is smoothed by a continuous line. By considering higher order 
interactions the eigenvalue would remain zero closer and closer to the critical threshold qc- 
This is evident, for example, in the problem with 3,4 and 5 interactions, that we solve 
using extremal optimization next. We expect that adding more and more interactions the 
eigenvalue has several (continuous) transitions, when departing from zero, for smaller and 
smaller values of q and eventually for interactions of infinite order it jumps discontinuously 
from zero to one exactly at qc- 

In the inset of Extended Data Fig. [^we show also a comparison between the RS cavity 
method and extremal optimization done in the next section. As can be seen the difference 
is very small, and we believe that they are actually very close to the true optimal result. 
The size of the ER network in this case is A^ = 128, for which EO gives the actual ground 
state. In favor of this conjecture we can also say that the IRSB estimate, which is in general 
more correct that the RS one, is anyway very close to the latter, and would lie in between 
of the two curves shown in the inset of Extended Data Fig. The eigenvalue estimated 
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with RS is anyway slightly smaller. This is a typical feature of the RS cavity method, in 
the sense that it gives a lower bound (and not an upper bound) to the ground state energy 
of the system. On the contrary, EO provides an upper bound to the optimal threshold. 
As the inset of Extended Data Fig. shows, both the lower bound (RS) and the upper 
bound (EO) are very close to each other, and therefore to the true optimum. Therefore, we 
observe that there can be different ways to analytically assess the location of the optimal 
threshold. Conversely, this does not mean that different methods are equally able to hnd the 
actual configuration of optimal influencers, even if they give similar analytical estimation 
for the threshold. A more important issue in any NP problem, perhaps the most relevant 
for any practical purpose, is Ending a scalable algorithm which approximates the optimal 
configuration as better as possible and can be used for very large networks. This is the main 
reason why our Cl algorithm was designed for. 

To conclude this section we want to observe that the antiferromagnetic nature of the 
optimal percolation problem is not totally unexpected. Indeed, the antiferromagnetic in¬ 
teractions between nodes reflect the intuitive idea that immunizing contiguous nodes is less 
efficient than immunizing them in a staggered way. 


IV. MINIMIZATION WITH EXTREMAL OPTIMIZATION (EO) 

In this section we describe another method for the minimization of the eigenvalue A(n), 
called Extremal Optimization (EO) [22], which has the advantage, with respect to the cavity 
method, to be easily implemented for higher £-order of |w£(n)|. However, EO is still not 
scalable for large networks, for which we will implement Cl. Nevertheless, EO makes use 
of the full energy function, including loops, and can be used to extrapolate the solution 
for large networks where EO is not applicable anymore. Indeed, the EO algorithm is an 
efficient method to find nearly optimal solutions. It was used successfully to find the ground 
state energy of spin glass models on random graphs, where it was shown to be practically 
identical to the best available analytical prediction [22]. We believe that also in our case 
r-EO is very close to the optimum when extrapolated to large systems. In Extended Data 
Figs and 1^ we estimate the optimal solution for large networks using a finite size scaling 
analysis extrapolating to the infinite size limit and £ —>■ oo, as explained next. 

To explain the method in the simplest way let us consider again the lowest non trivial 


59 


approximation to the eigenvalue A(n) ~ and the corresponding cost function £^(n): 

£:(n) = \wi{n)\‘^ = ^Aij{ki-l){kj-l)ninj. (93) 

ij 

We now assign to each variable rii the htness ftp 


6j (j 1) ^ ^ Aij (^kj 1) Tij , 


so that we can rewrite the energy function (93) as 


(94) 


£:(n) = ^ biUi . (95) 

i 

Notice that this is similar to the form we adopt to define Cl in Eq. (|^. The Cl-algorithm 
is an adaptive version of the EO algorithm, in a sense that will be explained in Sec. |Vj The 
EO algorithm being the exact minimization of the largest eigenvalue of the NB matrix, 
which can only be achieved for small systems. 

Each node in the state n* = 0 (removed) gives zero contribution to the energy, while 
nodes for which Uj = 1 give a contribution equals to their htness. Therefore, to minimize 
the energy T(n) we have to hnd the set of nodes with the lowest htness, under the usual 
constraint = A^(l — q). Note that the htness bi of node i depends on the states Uj of 

its neighbours j. 

The aim of EO is to explore the space of states looking for the conhgurations with the 
smallest htness. Let now explain how it works. For a hxed q, in the beginning the variables 
rii are assigned at random into two sets: set Sq containing the qN nodes to be removed 
Uj = 0, and set Si containing (1 — g)A^ nodes with Uj = 1: 


Si = {f : Ui = 1} , 


(96) 


5*0 = {« : Ui = 0} , 

with the constraints l^ol = Nq, and |S'o| + |S'i| = N. The initial separation of the nodes 
in these two groups is made arbitrarily. Then the htness 6* corresponding to this initial 
conhguration Co are evaluated and sorted separately for the two groups. The hrst move 
consists in exchanging the variable rii in with the largest htness, with the variable rij in 
So with the lowest one. In other words, we set n* = 0 and nj = 1. This move does not 
change the sizes of Si and So and hence the global constraint = -^(1 ~ remains 
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satisfied. The energy function (95) corresponding to this new configuration Ci is evaluated 
and if T(C'i) < £{Cq) the conhguration Ci is stored together with the value of its energy. 
The process is repeated by recomputing the new fitness and swapping the variable with the 
highest value of bi from Si with the variable corresponding to the lowest one in So- Note 
that the moves are accepted unconditionally at each step and only the best conhguration 
found so far is saved. The algorithm is terminated after a maximum number of iteration is 
reached. 

What we have described so far is the basic EO algorithm. It can be improved by intro¬ 
ducing a tunable parameter, called r, so that we will refer to it as r-EO. In this version of 
the algorithm, the choice of the variables to be swapped is not performed deterministically 
by selecting the ones with the largest and smallest htness, but, instead, they are picked 
up using a random selection. This may look counterintuitive at hrst sight, since we would 
not expect any improvement by randomizing the choice. Actually this is not true, and an 
improvement can be achieved, provided that the random rule is chosen judiciously. 

In the r-EO algorithm, we sort the htness in the two sets So and Si, in increasing order 
in Si and decreasing order in Sq: 


fcn(i) > bn{ 2 ) >■■■> bu{\Si\) > 

&A(1) < &A(2) < ■ ■ ■ < &A(|So|) ) 

where 11 and A are two permutations of the labels i of the variables in Si and So respectively. 
The worst variable in Si (the one with the highest htness) is nn(i), that we want to change 
with the worst variable in So, i.e., nA(i) (the one with the lowest htness). In the simple EO 
algorithm this is exactly what we were doing: exchanging nn(i) and nA(i). Now we rank the 
variables according to their htness. 

For the variables in Si the worst variable is nn(i), which has rank 1, while the best variable 
is Ends’ll); which is of rank IS*!!. For the variables in So the variable of rank 1 is nA(i), while 
nA(|So|) rank |S'o|. Then we consider the following probability distribution over the ranks 
r: 

F(r) oc , (98) 

with r G [1, |S'i|] or r G [1, IFqI] for the ranks of variables in Si or So, respectively. At each 
update, we draw two numbers ri and ro from P{r) and then we swap the variables nn(ri) 
and nA(ro)- Then the algorithm proceeds as in the original EO. Note that for r —)■ oo, we 
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recover the deterministic EO algorithm, swapping only the worst variables. The idea behind 
the choice of the scale free distribution P(r) is to ensure that no variable gets excluded from 
changing set, while giving higher priority to the variables with worst htness. The random 
selection of the variables has the advantage, over the deterministic process, to make possible 
global reconhgurations of the system, thus climbing over the energy barriers and hnd better 
minima. In our problem we found that a value of r = 1.7 gives the best results. 

As an application of this method, we perform the minimization of the energy function 
£^(n) on an Erdos-Renyi network with average degree {k) = 3.5. We considered different 
system sizes N = 2^, 2®, 2^, 2®. For each size N we took the average of the ground state energy 
over 100 realizations. For each instance we performed updates of the rEO routine. The 
results for the eigenvalue A(g) = \j£{q)/N{k) are shown in Extended Data Fig. 

We observe that the hnite size scaling of the optimal influence threshold Qc, which is the 
solution of A(gc) = 1, is consistent with the scaling law: 

q,{N) = q,{oo) + A7V-2/3 ^ ^gg) 

where A is a coefficient independent from the size N. This scaling form is the same of 
the hnite size correction to the thermodynamical energy density of a spin glass system 
with pairwise interactions (e.g. the SK model) at and below the de Almeida-Thouless line. 
Actually this scaling form is observed also for other interesting thermodynamic quantities, 
like the second cummulant of the overlap distribution [H]. In that case the anomalous 
scaling (as opposed to the more natural 1/N correction expected from the central limit 
theorem) is due to the existence of inhnitely many zero modes, whose volume grows as 
In the language of our model, these zero modes represent the inhnitely many way to choose 
the set of optimal inhuencers. It would be very interesting then to interpret what kind of 
hidden symmetry relates all these set of optimal inhuencers. 


A. r-EO with multibody interactions 

In the general case the energy function we want to minimize is: 

T(n) = |w^(n)|2 (100) 
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which involves at most 2£-body interactions. To treat systems with at most a (odd) number 
of {2£ + l)-body interactions, we consider the energy function: 

T'(n) = (w£(n)|A^|w<.(n)) . (101) 

In order to apply the EO algorithm to systems with many-body interactions, all that we 
have to do is to change the definition of the fitness 6*. For example, in the case of a system 
with 4-body interactions, described by |w 2 (n)P, we set bi as: 

hi ^ ^2fc)(l bjljZiZ^TljTlf^Tli^ (102) 

jki 

where Zi = ki — 1. 

After that, the algorithm can be applied exactly in the same way as we did for the 
system with two body interactions. We use the algorithm to minimize the energy function 
of a system with 3,4, 5-body interactions as shown in Extended Data Fig. Two comments 
are in order. Firstly, we note that the eigenvalue is zero for a larger interval of values of q 
with respect to the one computed in the system with 2-body interactions. This observation 
corroborates the idea that in the limit of infinitely many interactions the eigenvalue jumps 
at Qc from zero to one. 

In Extended Data Fig. we show the threshold qc{N) as a function of the system 
size for different values of the order of the many-body interactions p. The thermodynamic 
limit for each many-body interaction q'^{p) is obtained hy N ^ oo (the ^/-intercept in the 
figure). The value at p = 1 represents the system with one-body interaction (equivalent to 
HD). The value p = 2 represents the system with 2-body interactions and energy function 
|wi(n)p; p = 3 corresponds to the system with (at most) 3-body interactions and energy 
(wi(n)|Al|wi(n)), as summarized in the following Table for the first three levels. 



Even interactions 

Odd interactions 

Energy function 

£i{r)) = |w<>(n)p 

= (w£(n)|Al|w£(n)) 

Order of Interactions 

p = 2£ 

p = 2£ + l 

Leading diagram for i = 0 


^ one-body 

Leading diagram for £ = \ 

^ ^ ^ two-body 

^ ^ ^ ^ three-body 

Leading diagram for £ = 2 

i ^9 four-body 

i five-body 
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In Extended Data Fig. S>. we extrapolate the infinite size threshold q^{p) to the limit 
of cxD-body interactions, i.e., for p —>■ oo. The scaling of with 1/p is well consistent with 
a linear behaviour. We obtain the p = oo limit of q^{p = oo) = g/P* from a least-squares 
fit. For ER networks with average degree (k) = 3.5 studied here, we find g/P* = 0.192(9). 
This is the value of the optimal threshold shown in Fig. It in the main text. 


V. Cl ALGORITHM 


We have shown so far that the problem of finding the optimal set of infiuencers can be 
solved by minimizing the following cost function which is the leading order approximation 
in 1/A: 


N 


Ee{n) = 12 


n 




( 103 ) 


i=l 


jg9Ball(jy) \kOPi{i,j) 


where ^^(n) = |w(£+i)/ 2 f for £ odd (corresponding to the energy function <£^(n) in Eq. S100), 
and ^^(n) = (w£/ 2 |Ai|w£/ 2 ) for £ even (corresponding to £'{p) in Eq. ^ 101). We recall that 
Zi = ki — 1. We define the collective influence strength at level £, of node i as: 


Cl£(z) = Zi ^ n 1 


jS9Ball(i,£) \kOPi(ip) 


and we can rewrite Eq. (103) as: 


( 104 ) 


N 

E,(n) = 5^CI,(^) . (105) 

i=l 

Notice that Cli{i) is basically the same as the fitness bi of the EO algorithm, and precisely: 
Cl£(i) = biUi. A fast and efficient way to minimize the cost function ^^(n) is to adaptively 
remove the nodes with the highest collective influence CR(i). When all the nodes are present, 
corresponding to n = 1, C\i{i) evaluates: 


CR(i) = Zj ^ Zj . (106) 

This is the expression of CR(z) given in the main text Eq. (|^. By computing this 
quantity for each node, we can find the one with the largest collective influence and then 
remove it. We stress that the frontier of the Ball: 9Ball(i, £) consists of all the nodes j that 
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are at a distance ^ from i, the distance is measured as the minimum path between i and j. 
This dehnition is consistent with the fact that we have neglected the NB walks with loops 
in the dehnition of the energy functional Eq. ^53] for large networks, and therefore also in 


Cl£(i) Eq. ^106, since they scale as 0{1/N) in random networks as discussed in Section 

HB 

After the removal, the network consists of iV — 1 nodes, and we can proceed as before, 
looking for the next node with the largest Clg. Since the removal of the hrst node changes the 
degree of its neighbours, we need to decrease their degrees by one before recomputing their 
Ch- Removing one by one the nodes according to this adaptive principle we can destroy 
the network in a nearly optimal and very fast way. Besides, we can signihcantly speed up 


the algorithm by decimating a hnite fraction of nodes at each step (see Section VB). The 
algorithm’s performance increases by using larger values of the radius i of the Ball(i,f'). In 
Extended Data Fig. [^we show the results for different values of i. We observe that already 
for £ = 3,4 the algorithm reaches the top performance. 

When i becomes larger than the network diameter, then Ch(i) = 0. In this situation 
different nodes are not distinguishable by the algorithm, and thus, the method is basically 
indistinguishable from a random one. Thus, the parameter i should not exceed in practice 
the original network diameter. We also notice that dangling ends give zero contribution 
by Cl, and hence they are ignored by the algorithm. This is expected since dangling ends 
should have zero influence in the network. 

The Cl algorithm Eq. ([^ is based on Eq. (|^ which contains the many-body collective 
interactions that we refer to as “collective influence”. The Cl algorithm incorporates the 
collective effects by considering the adaptive nature of the algorithm. The adaptiveness of 
the Cl algorithm, usually called decimation in the spin glass literature [36], is a collective 
way to select influential nodes, since the removal of each node depends heavily on the history 
of the process. 


A. Optimization for G{q) / 0 

The theory we developed for the optimal fragmentation of networks allows us to compute 
the optimal influence threshold Qc, i.e. the smallest number of nodes to remove such that 
G{qc) = 0, together with the corresponding conhguration n*. 
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When q < Qc the giant component is nonzero, a consequence of the fact that the system 
of Eqs. ([^ has another stable solution different from identically zero. Therefore, 

for q < Qc the stability of the new solution G{q) 7 ^ 0 is no more controlled by the non¬ 
backtracking operator, but a more complicated operator comes into play that depends on 
the form of the solution itself. To hnd the spectrum (or even the largest eigenvalue) of this 
new matrix we have necessarily to know the solution of the problem. This circumstance 
depauperates the method of its power in g < q^, since we need the solution to the problem 
to solve the problem itself. In the regime q > qc, where G{q) = 0, this solution can be easily 
guessed, as we did, but for q < q^. no simple ansatz can be adopted. 

What can we do in the regime q < qc to minimize the size of the giant component? 

We know that the conhguration n* corresponds to a zero giant component. Assuming 
that this conhguration is the optimal one (this hypothesis is not crucial in what follows 
and can be relaxed by saying that n* is the best approximation to the true optimum), then 
the optimal trajectory in the conhguration space, starting from the point n = (1,1,1,..., 1) 
corresponding to g = 0 and G( 0 ) = 1, must end up at the point n* at g = qc where G(gc) = 0 . 

So far we know the hnal point n* and we would like to travel back the optimal trajectory 
up to the initial point n = 1. In order to do that, let us suppose to decrease inhnitesimally 
the fraction of removed nodes g from its critical value qc, that is g = gc — dg [dq can be 
taken equal to l/N, so that Ndq = 0(1)]. This amounts to explore a neighborhood of the 
conhguration n*, consisting of a collection of conhgurations n' in which a number Ndq of 
components u' = 0 is turned into n[ = 1. Here we are making the crucial hypothesis that 
the optimal trajectory is continuous, in the sense that in going from g —)■ g — dg only a 
number Ndq of components Ui is changing state. Under this assumption the trajectory can 
be followed adiabatically up to the point n = 1. 

Mathematically, this can be expressed by saying that the Hamming distance between 
two neighbouring optimal conhgurations x and y: d(x, y) = Xlili 1 ^* ~ dil) is equal to 
d(x, y) = Ndq. This hypothesis may not hold in the case where the optimal conhguration 
X corresponding to Nq removed nodes, and the one y corresponding to N{q — dq) have an 
Hamming distance much larger than Ndq. In this case the optimal trajectory has disconti¬ 
nuities, jumping from one point to another which are not close to each other. Physically this 
correspond to the fact that the optimal state y cannot be obtained from the optimal state x 
by hipping a hnite number of components rii, but requires a global rearrangement of the sys- 
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tern, which amounts to change the state of much more variables n*, whose number scales as 
iV“ with exponent a G (0,1]. This situation takes place in spin-glass systems with Full-RSB 
thermodynamics, where this chaotic behaviour is observed as a function of the temperature 
Ha (or as a function of other control parameters like the bond strengths and the magnetic 
field). In that case, when the system is cooled from a temperature T to T — dT (with T 
below the critical point: T < Tc), the Gibbs state corresponding to the higher temperature 
does not survive after the cooling, but, instead, a completely new equilibrium state appears 
at T — dT (i.e. if we sample a typical equilibrium configuration at temperature T, this will 
be very distant, in the Hamming sense, from a configuration sampled at T — dT). 

It is highly plausible that the same situation (a chaotic trajectory) is realized also in 
our problem. To keep things simple, we don’t explore this scenario, and analyze only the 
consequences deriving from the hypothesis of a smooth optimal trajectory from n* back to 
1. This approach will give us a very efficient algorithm to minimize the giant component 
in all the interval q G [0,gc), at no additional computational cost. Therefore we take this 
performance as a practical justification of the main assumption, leaving to a future work 
the treatment of the more complicated chaotic scenario. 

To take up the threads of our discussion, let us assume that optimal configurations lie 
close to each other. Knowing the optimal configuration at the fraction g, we should be able 
to find the new optimal one ai q — dq by changing the state of few variables, and actually 
only one if we take dq = 1/N. Practically we have to find the new optimal configuration by 
changing the state of a single variable from 0 to 1. Practically we proceed in the following 
way. At g = gc, we have G(gc) = 0. The corresponding configuration n* contains Nqc 
variables Uj = 0 (removed nodes), and A^(l — qc) variables rii = 1 (nodes that are present). 
At this point we start to add back to the network the nodes using the following algorithm. 
We assign to each removed node n, = 0 an index c{i), which is calculated as the number of 
clusters that node i would join if it were reinserted in the network, independently of their 
sizes. Then we put back in the network node i* such that i* = argmin^,.„_Qc(z), by changing 
n* = 0 into n* = 1 (see Extended Data Fig. [^. The idea behind this method is the fact that 
we want to keep a maximally fragmented network after each reinsertion of nodes. We keep 
on reinserting nodes using the same criterion, until no node is left for which n* = 0. After 
each reinsertion the indexes c{i) are recalculated, and then the new node with minimum c{i) 
is chosen. 
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The running time of this algorithm is 0(MiV log iV), where M is the number of edges. 
Indeed, 0{M) operations are needed to assign the indexes c{i) and 0(A^log A^) to sort them. 
As we did for the case of the main Cl algorithm, the time complexity of this algorithm can 
be reduced to 0{N log N) (if M = 0{N)), by reinserting a finite fraction of nodes at time. 


B. Scalability of the Cl algorithm 

The time complexity needed to compute the quantity Cl£(i) is proportional to the number 
of edges inside the ball Ball(i,^)- Since the radius i is taken hnite, this calculation takes 
a time 0(1) for each node (even if the prefactor increases with i). Thus, to compute 
the Gli{i) for all i requires 0{N) operations. Sorting the Cl£(z)’s takes 0{N \ogN). The 
algorithm is terminated when a number Nqc of nodes is removed. Therefore, removing the 
nodes one-by-one, the total time complexity would be 0{N'^\ogN). Actually we can keep 
the computational complexity to 0{N\ogN) without losing any performance, by simply 
removing a finite fraction of nodes at each step (with a prefactor depending on the percentage 
of nodes hxed at time). In the next Section we explore the performance of the Cl algorithm 
for different adaptive/decimation steps. 


C. Effect of the percentage of fixed nodes during adaptive Cl 

In this section we show the performance of Cl as a function of the percentage of removed 
nodes at each step of the adaptive algorithm. Indeed, removing a finite fraction of nodes at 
time reduces the time complexity from log N (corresponding to the one-by-one removal) 
to A^logA^. In Extended Data Fig. [^we show the effect of the percentage of fixed nodes at 
each adaptive step on an ER network of A^ = 10® nodes and average degree {k) = 3.5. As the 
hgure shows, the performance of Cl is practically unaffected by the removal of up to 0.25% 
of nodes at time (i.e. 250 nodes for the considered network) compared to the one-by-one 
removal. 


VI. COMPARISON WITH OTHER HEURISTIC METHODS 


In the main text Fig. |^we compare our solution with heuristics: high-degree and high- 
degree adaptive pen], PageRank [7], kcore [12], eigenvector [321 closeness [31] central¬ 
ities. We also compare in Fig. [S] for Twitter and Mobile Networks, Cl with HDA, HD, PR 
and k-core which are the only heuristics that are scalable to these large-scale datasets. It 
remains to compare our results with other popular heuristics which do not scale well with 
system size, and therefore we use smaller systems of 10^ nodes: betweenness centrality [35] 
and equal-graph-partitioning m- We use the same size and parameters of the scale-free 
network used in m- The hnal comparison is with BP [H] and it will be done in the next 
section. 

Betweenness centrality (BC) [35]. Betweenness centrality of node i is the sum of the 
fraction of all-pairs shortest paths that pass through i. BC is a very popular tool for network 
analysis, which has applications in different helds, from community detection to the human 
brain. However, it comes with a high computational cost that prevents the examination 
of large graphs of interest. The best algorithm for BC computations has 0{NM) time 
complexity for unweighted networks with N nodes and M vertices. It is not fast enough, 
for example, to handle our 10-|- million people network. Extended Data Fig. [^ shows its 
performance. It does not outperform other centralities. 

Equal-graph-partitioning (EGP) [11] . This method aims at dividing the network 
in clusters of equal size. It can behave well for homogeneous networks, like random regular 
graph, where an equal partition could be expected to destroy the network efficiently, but 
loses a lot of performance for heterogeneous networks, like scale-free networks, as we can see 
from Extended Data Fig. Notice that we have used the same network parameters, size, 
and definition of EGP as given in m in the comparison of Extended Data Fig. In fact, 
we reproduce the same curve and qc as found for EGP in m- 


VII. COMPARISON WITH BELIEF PROPAGATION ALGORITHM OF ALTARELLI 

ET AL. [m 

The comparison with the Belief Propagation (BP) method proposed in Ref. [H] to 
optimally immunize a network deserves particular care, because this method does not apply 
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directly to the problem we are treating here. This is due to the fact that the parameter 
p in the work of Ref. [13] (which is noted as q in [T3|) refers to the fraction of initially 
infected individuals. In our work the fraction p is assumed to be zero, because in epidemic 
outbreaks the number of initiators of the epidemic is very small, and typically of order 0(1) 
|46j . For instance. Sierra Leone’s explosion of Ebola cases in 2014 appeared to stem from 
one traditional healer’s funeral at which a single source infected 14 women; or the SARS 
outbreak in 2003 started when one doctor from China infected nine other guests in a Hong 
Kong hotel who then spread the virus throughout the city and to Vietnam and Canada 
(source- NY Times August 29, 2014, page A7, “Outbreak in Sierra Leone Is Tied to Single 
Funeral Where 14 Women Were Infected.”). Another example is the patient zero-hypothesis 
in the AIDS epidemics [37] . 

On the other hand the model of Ref. [13] is valid for p > 0, in particular, the results of 
Ref. [13] are illustrated for p = 0.1. The case p = 0.1 would imply an epidemic starting 
with 10% of the entire population infected independently at the same time. This would 
imply, for instance, 0.6 million people in Sierra Leone spontaneously and independently 
being infected at the same time, which would make any targeted immunization intervention 
perform equally well in practice. This result was shown by Ref. [13] in Fig. 12a: when 
p > 0 any reasonable targeted immunization method gives the same result for the fraction 
of infected nodes vs immunized nodes. [Fig. 12a in [T3] treats the case of p = 10%, noted as 
g = 0.1 in the notation of ig, and compares BP with greedy, HDA, eigenvector centrality 
and simulating annealing; all showing the same performance]. 

Therefore, the results of our paper are illustrated for p = 0. That being said, next, 
we compare our results with the BP algorithm in the closest possible regime to ours when 
p —)■ 0, and also for p = 0.1. In the limit p —)■ 0, BP becomes unfeasible because the time 
complexity of the BP algorithm diverges as p~^ for p —)■ 0, as we explain below. The results 
are shown in Extended Data Figsj^ and IT and we observe that BP does not perform better 
than CL Furthermore, the poor scalability of BP makes it prohibitive for the real networks 
of 10-|- million people used in our work. 

To perform a comparison, we need to briefly recall the approach of Ref. m and set 
the notation. The formulation of the problem is based on the long time limit of the SIR 
dynamics, which is described by the set of variables {z/j}, i = 1,..., V, giving the probability 
for each node to be infected after the epidemic outbreak (in Ref. [33] the variable n* is called 
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rrii, but we prefer to use Ui to make contact with our notation). These variables satisfy the 
following equations: 


= p+{l-p) 


1 - 


]^(1 - WUk^i) 


(107) 


k^di 

where the parameter p is the probability for node i to be initially infected; w is the probability 
that a given neighbor k of node i transmits the disease to z; and the product on the r.h.s. 
is over all neighbours k of node i. The variable (named mk^i in Ref. [H]) is the 
probability that node k is infected in a modihed network where node i is absent. Each 
is associated with a directed edge z —>■ j of the graph, and satishes the following equation: 


= p + (1 - p) 


1 - n “ wvk^i) 

k&di\j 


(108) 


To include the effect of immunization, the authors of Ref. [m] introduce a binary variable cxj 
for each node z, taking values at = +1 if node z is immunized, and Uj = —1 if not. Equations 


(107) and (108) then become: 


1 I / \ 

SP+V--P) 

1 - CTi 


1 - (1 - WVk^i) 


k£di 


2 


p+ (1 -p) 


1 - n “ wvk^i) 

kGdi\j 


— ^Uk^i}k£di) 5 


— {^k^i}k£di\j^ ■ 


(109) 


To hnd the optimal immunization set. Ref. [H] minimizes the following cost (energy) func¬ 
tion E{a, v)\ 

N ^ 1 I ^ 

E{(y,v) = 2 ^' = X]e(ai,i/i), (110) 

2 = 1 2=1 2=1 

where /z is a chemical potential controlling the fraction of immunized nodes. At this point. 
Ref. [13] applies the cavity method to estimate the single site marginal Pj(cTj), which gives 
the probability that node z is immunized. Approximating the network with a tree rooted 
on node z, the authors of Ref. [H] derive the following equation to assess the probability 
distribution Pi(crj): 


PMi) ~ 



k^di 


( 111 ) 


where /3 is the inverse temperature, and the functions Qk^iik'k^i, ^i^k) satisfy the following 
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BP equations: 


Q i^j ( 7 -^i ) — 


i^kj 


/ n e Y[s[ui^k-Fi 

o-i ykGdi\j J kedi 

( 112 ) 

Next, we iterate the BP equations to perform a comparison with our approach. These 
equations do not have an analytical solution, so that, following Ref. [13], we solve them 
numerically by discretizing the function Qi^j{ui^j,Uj^i) in a number A4m of bins. The 
computational cost to update each message Qi^j is of order where ki is the 

degree of node i. This makes the algorithm practically unfeasible on networks having nodes 
with large degree (think e.g. to scale free graphs). To overcome this problem, the authors of 
Ref. [Ill use a convolution trick, which reduces the computational cost to 0{{ki — 


Using the convolution method of Ref. [m], Eq. (112) reads: 


Qi—(^i—) k/—>*) — ^ 




k&di\j 




1 — p \ 1 — p 1 — p J 

(113) 

where the function M^"‘\x,y) is dehned iteratively by the convolution: 

M^^\x,y) = [ da;ida ;2 S{x — X 1 X 2 ) M^^~^\xi,y) M^^\x 2 ,y) , 


M^^\x,y) = J dz/(5[a; — (1 — tcz/)] Q ^z/, 1 — (1 — ^ 


(114) 


In all the following numerical results we will always use the efficient form (113) of the BP 
equations. 

From the knowledge of the functions Qi^j{ui^j, Ref. [13] computes the probability 
distribution Qi(z/j) that node i has been infected during the epidemics [z/j is defined in the 


hrst line of Eq. (109)], which is given by: 




-/3m 


/ dz/^_^j (z//j_,.j, 0) 


.k£di ' 


5(z/j) + 


e-/3^i 

1 — p 




1 - Ui 1 - l/i 


1 — p ’ 1 — p 


0(z/i-p) . 
(115) 


Moreover, they estimate the single spin marginal Pi{cri) as: 


Pi{o-i) ~ e 


k^di 


6{a-l) + 




h(cT+l) 

(116) 


72 




















Once the authors of Ref. [H] obtained the probability distributions Qi(r'i) and Pi{<Ji), they 
can compute the average fraction of infected nodes /: 


/ = — 
N 


N 

/ dui Vi Qi{vi) , 

i=l 


(117) 


and the average fraction of immunized nodes q: 


N 


1 


2=1 2=1 


(118) 


A. BP adaptive 


Before we compare BP with our method, we need to illustrate the BP method on a 
ER network to clarify some technical issues. We consider a small ER random graph of 
N = 200 nodes, where BP can be studied, and average degree {k) = 3.5. We use the 
following values of the parameters: fraction of initially infected nodes p = 0.1, inverse 
temperature (3 = 3.0, and transmission probabilities w = 0.4, 0.5, 0.6, 0.7. The results are 
shown in Extended Data Fig. where we plot the fraction of infected nodes / versus the 
fraction of immunized nodes q. As already noticed in Ref. [H], we observe that, while for 
w = 0.4 the curve is continuous in the whole range of values of g, for larger values of w the 
curves get interrupted at a certain value of q. This is due to the fact (as mentioned by the 
authors of Ref. m) that the free-energy is non-convex in that region of values of q, and 
the chemical potential is flat as shown in Extended Data Fig. i). Therefore, all values of 
q in that region cannot be explored using the normal BP method. Physically, the fact that 
the thermodynamical potential becomes non-convex is the signature of a phase transition 
happening at a certain value of w. To overcome this problem, the authors of Ref. na 
suggest the following technique. One adds an extra magnetic held H to the energy function 


E((T, v) in Eq. (110), which is then adjusted at each update of the BP equations to keep hxed 
the value of immunized nodes q. We implemented this adaptive BP method (called ’hxed 
density BP’ in Extended Data Fig. [^), and we found that the missing part of the curve can 
be ehectively reconstructed for some values of w larger than w = 0.4. Nonetheless, for even 
bigger values of w, we found that the missing part of the curve cannot be fully reconstructed, 
since the adaptive algorithm does not converge anymore. Usually the non-convergence of 
the BP algorithm is associated to the existence of a phase transition (diherent from the 
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aforementioned one), marking the limit of validity of the replica-symmetric cavity method. 
We then expect that for those values of q, where reconstruction is impossible, a different BP 
method has to be used, in order to deal with the phenomenon of replica symmetry breaking. 

In the next Section we compare BP with Cl in two different settings. The hrst case is 
the closest one to the regime where Cl is dehned (i.e. when p —>■ 0 and w —)■ 1). The second 
type of comparison is the case where p > 0 and the BP method can be used for all values of 
ge [0,1]. 


B. Comparison 


1. First comparison 


Here, we compare BP in the closest possible regime to Cl, i.e. for p —>■ 0 and w —>■ 1. 
Solving numerically the BP equations requires to discretize the functions 
in a number of bins of the order N'bin ~ l/p, in order to have good numerical accuracy, 
because the smallest possible non-zero value assumed by z/j is Ui = p, as stated in Ref. [H] . 
The BP running time is of order M being the number of edges in the graph. 

The factor comes from the computation of the function p) in Eq. (114), that 

requires a double integration over xi and X 2 (giving a factor Af^in): for each value of y (giving 
an extra factor Nbin)- Since Mun ~ 1/p, the BP running time diverges as p“^ for p —)■ 0. 
This is the reason why we cannot use BP directly for p = 0. So, we set p = 0.01, as small as 
possible, and w = 0.99, as close to 1 as possible, in the BP algorithm. Moreover, we choose 
a quite high value of the inverse temperature f3 = 10, close enough to the zero temperature 
limit. Note that for this value of p = 10“^, the number of bins needed for good numerical 
resolution is of the order of Mun ~ 10^, which introduces a prefactor in the computational 
cost of the algorithm already of order 10®. 

We compare Cl and BP on a small ER network of iV = 10^ nodes and average degree 
{k) = 3.5, where BP can be run efficiently to do a study over the parameter space. Since for 
those values of p and w we cannot compute the full curve /(g) (for the reasons explained in 


Section VIIA), we compare the giant component found by BP and the one obtained with 


Cl (notice that / coincides with the giant component G in the limits p —)■ 0 and ta —)■ 1). 
In order to choose which nodes have to be removed according to BP, we use the following 
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criterion: we run BP and we assign to each node the value of the sign of its magnetization: 
sign( (cTj)) (the value of the inverse temperature we chose, (3 = 10, is sufficiently high for the 
spins (Tj to be highly polarized). Then, node i is removed if sign(((Tj)) = 1, and it is not 
if sign({(Ti)) = —1. When BP does not converge, we stop the algorithm after a maximum 
number of iterations and we use the unconverged marginals to assign the magnetizations. 
In this way we can draw the full curve G{q) even if BP does not converge. The result of 
the comparison is shown in Extended Data Fig. |^, where we can see that BP is not better 
than Cl, and performs slightly worst than the HDA method. 

To conclude this section, we mention two other versions of the BP algorithm. The hrst 
one is developed in Ref. |1H]. The technique used in Ref. |1H] is the same BP technique 
as the one introduced by Altarelli et al. [laiiii]. From the analytical point of view. Ref. 
[38] improves the lower bound on the threshold Qc by considering the effects of 1 step replica 
symmetry breaking (IRSB), obtaining slightly larger lower bounds than those predicted by 
the replica symmetry (RS) approach of Altarelli et al.\ or 

respectively in the notation of Ref. |3S|- Hence the lower bound in Ref. |3S] is larger than 
the one obtained by Altarelli et al. 

The second variant of the BP algorithm is used in Refs. [3^ 150] for solving the undirected 
feedback vertex set problem. This algorithm, named Belief Propagation Guided Decimation 
(BPD), improves the time complexity of the BP approach of [131 HH EH] and can be tested 
in SF networks. In Extended Data Fig. ii we compare the BPD with Cl algorithm where 
we hnd evidence of the best performance of Cl. 


2. Second comparison 


In this section we compare Cl and BP in a different way; this time in the case where BP 
is well dehned and Cl is not, for parameter values p ^ 0 and w 1. Thus, we use p = 0.1 
and w = 0.5. So, this second comparison represents the opposite situation with respect to 
the previous one. We compare the two methods in the following way. We use BP to compute 
directly the fraction of infected nodes f{q) as a function of the fraction q of immunized nodes 


by means of Eqs. (117)-(118). To compare against Cl, we have to simulate explicitly the 


SIR process, since we cannot estimate directly the /(g). More precisely, we hrst identify the 
immunized nodes with Cl, and then we run the SIR algorithm to obtain the hnal fraction 
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of infected individuals f{q). 

The result of the comparison is reported in Extended Data Fig. for an ER network of 
N = 10^ nodes and average degree (k) = 3.5. The values of the initially infected individuals 
p and the transmission probability w are p = 0.1 and w = 0.5. The value of the inverse 
temperature /? used in the BP algorithm is /? = 10 (for the portion of the curve where the 
adaptive BP algorithm is needed, we chose the lowest possible temperature such that the 
algorithm converges). As the figure shows, there is little difference between BP and Cl, with 
Cl slightly better for small q. Moreover we checked that even using HDA gives more or less 
the same results as BP and Cl, as the authors of Ref. [H] also show in Fig. 12a of their work. 
Therefore, in the case when p > 0 (meaning that a hnite fraction of the entire network is 
already infected from the very beginning of the epidemic outbreak), any reasonable targeted 
immunization technique gives the same result. That is, the optimization achieved by any 
method is washed out by the large number of already infected people, and all strategies 
perform equally well. On the contrary, in the case when p = 0, i.e. when the epidemic is 
initiated by a superspreader event 0(1), different strategies behave very differently, with Cl 
being the best so far. 

We notice, en passant, that the analytical BP estimation of f{q) gives a lower bound on 
the actual f{q). That is, if we used the same procedure as for Cl, by hrst identifying the 
immunized nodes and then computing the fraction of infected ones trough the outcome of 
the SIR process, the resulting curve f{q) would he above the analytical BP estimation. 

Finally, we note that EO estimates the optimal numerical value of the threshold qc as 
a numerical extrapolation to N ^ oo and i ^ oo. While EO can estimate this threshold 
accurately (providing an upper bound very close to the real optimum), it cannot provide the 
actual optimal conhguration n* for large system sizes. This is of course a general feature 
due to the NP-hardness of the problem. 

Indeed, the EO method we use to estimate the value of optimal threshold for ER random 
graphs in Extended Data Fig. i) may not be the only way to assess analytically that result. 
Indeed, there are other methods to approximate the location of the optimal threshold, which 
can provide lower or upper bounds. For instance, the BP (or cavity) method investigated 
above writes down approximate self-consistent equations for the optimization problem, that 
are solved iteratively to get an estimation of the optimal threshold. Often, the BP equations 
do not converge (as a consequence of the NP-hardness of the problem), but an attitude has 
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gained a foothold in the statistical physics community, which amounts to ignore convergence 
problems and use anyway an unconverged solution as an estimation of the optimal threshold. 
Indeed, in all cases where this approach has been pursued, it has been shown that the BP 
analytical prediction provides a lower bound to the optimal threshold. On the contrary, 
the EO method employed in our work provides an upper bound to the optimal threshold. 
Therefore, different analytical methods can give, indeed, predictions which are close to each 
other and, hence, close to the optimal value of the threshold. 

Furthermore, it may not be impossible to hnd the exact analytical value of the threshold 
even if the problem is NP, as in the case of the Sherrington-Kirkpatrick model for spin glasses 
[36] . where the Paris! ansatz provides the correct solution. We also notice that analytical 
solutions are based on the analysis of the most probable case in general, but not for a specihc 
instance of the problem. Indeed, not every NP-complete problem can be analysed in this 
way. Some problems do not permit a discussion based on the most probable case. A random 
chosen satishability problem, for example, is almost always easy to solve, because a random 
sequence of symbols almost always does not make sense. 

In our problem of optimal percolation, even though the numerical value of the threshold 
could be known exactly with EO or other method, the main problem remains open; Ending 
an optimal configuration that is as close to the minimal as possible in the large system size. 
The most relevant challenge for practical applications of NP problems is not to estimate 
theoretically the value of the threshold qc, but to find a scalable algorithm (for realistic 
applications should be at most 0(A^log A^)) which is able to approximate as close as possible 
a real optimal configuration n* at Qc- 

Our algorithmic solution to this NP problem is then Cl: a scalable algorithm ~ 0{N log N) 
that contains the physics of the optimal configuration, and it is necessarily an approxima¬ 
tion to the true optimum; being a 0(A^log A^) algorithm it cannot give the optimal solution 
unless P = NP. Thus, proper benchmarking does not compare the analytical value of the 
threshold Qc- Benchmarking should be carried out by comparing the optimal configurations 
with the corresponding giant components for large size networks, which should be at least 
of the order of 10^-|- nodes (as we have done in Fig[^), showing an improvement both in 
the running time and efficiency. 
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VIII. A NEW PARADIGM OF INFLUENCE IN SOCIAL MEDIA: TWITTER 


In the next two sections we show that the performance of our method is confirmed in 
two real networks. We study two prototypical examples of real networks: Twitter web and 
a social network derived from phone calls. The former is used to test our theory as a new 
paradigm of influence, while the second can be used to design an immunization protocol in 
the case of an epidemic outbreak. 

We have paved the way to explore the consequences of our theory in real networks, where 
the assumption of tree-like structure that is the basis of our theory is not necessarily satisfied. 
The reason to be interested in such a kind of problem is that it is manifestly in the interest of 
man’s communal existence to understand how people increase their influence when they tie 
one another. The critical question is to what extent one can define a measure for influence 
solely on the basis of social contact network. The answer might be hard to hnd, but, at the 
same time, one cannot deny that the network itself mirrors the mutual relations of users, 
and hence it must contain information about their influence. The resulting network-based 
influence estimation can always be supplemented by measures of activity and engagement. 

With this caveat in mind, our optimal percolation theory uncovers the optimal influencers 
in social media. In this context, the measure of node-influence in social media is the drop 
in the size of the giant cluster which would happen if the node in question were removed. 
Such a measure of influence is related to the ability to spread the news to the largest portion 
of the network as shown by our mapping of the maximal spreading problem in LTM (with 
di = ki — 1) to optimal percolation. We test this idea in Twitter, next. 

Twitter is the online social networking and microblogging service that has gained world¬ 
wide popularity. Here we use the dataset of approximately 16 million tweets sampled 
between January 23rd and February 8th, 2011 and publically shared by Twitter (http: 
//tree .nist.gov/data/tweets/) (also available at http://jamlab.org, see Ref. [19] for 
more details). The natural way to get the social network is to extract the follower net¬ 
work through Twitter API. Unfortunately, due to the access rate limit of Twitter API, it 
is impossible to obtain the full information of the follower network in a reasonable time. 
Furthermore, many of the follower links are not active. To approximate the social network, 
we use an alternative way - the mention network [T9|. In contrast to the normal tweets, 
mentions are tweets containing ©username and usually include personal conversations or 
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references. In fact, the mention links have stronger strength of ties than follower links. 
Therefore, the mention network can be viewed as a stronger version of interactions between 
Twitter users. In the mention network, if user i mentions user j in his/her tweets, there 
exists a link from i to j. In order to better represent the social contacts, we also add to 
the network the retweet relations from the tweets. A retweet (RT ©username) corresponds 
to content forward with the specihed user as the nominal source. If user i retweets a tweet 
of user j, then a contact is established between j and i. We then consider all links to be 
undirected. In this way, the social network of Twitter is constructed. The resulting network 
has N = 469,013 nodes and M = 913,457 edges. 

We measure the collective influence of a group of nodes as the drop in the size of the 
giant component which would happen if the nodes in question were removed. The results 
are shown in Fig. |^, showing the better performance of Cl in comparison with HDA, PR, 
HD and k-core. The other heuristics and BP cannot be run in this large dataset. 

In Fig. we plot the percentage of fake influencers (PFI) or false positives as a function 
of the fraction of removed nodes q. This quantity is dehned with respect to the HD method, 
and represents the amount of different influencers between HD and CL More precisely, we 
call Sci{q) the set of influencers (i.e. removed nodes) found by Cl at a given value of q: 


and S'hd(<?) fhe corresponding vector for HD. Moreover we denote by \S{q)\ = Nq the size 
of the set. Notice that |S'ci(q')| is upper bounded by Nq'^^, i.e. |5'ci(q')| < Nq'^^, where 
gP is the influence threshold obtained with Cl. Indeed Nq^^ is the maximum number of 
influencers. Analogously, |S'HD(g)| is upper bounded by 
We dehne PFI(g) as: 


PFI(g) = 100 X 


l*5'ci(g) n >5'HD(g)| 
Nq 


( 120 ) 


In other words, we measure the percentage of different nodes removed by Cl and HD. 
As shown in Fig. &. the PFI at the critical threshold of Cl is PFI(g^^) ~ 26%, meaning 
that HD misses roughly 1/4 of the total number of (real, i.e. optimal) influencers. As a 
consequence the giant component for HD is still very large, GHD(g^^) ~ 0.37, and hence, HD 
needs to keep on removing nodes to fragment completely the network. This comes at the 
price of including a large number of fake influencers at the end of the process g™, where 


PFI(g, 


,HD^ 


48%. 
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In the same way, if one knew the true (optimal) conhguration of influencer, one could 
analogously dehne the fake Cl influencers which do not overlap with the optimal ones. 
Actually, the obtained nearly optimal set by the Cl method has an unknown overlap with 
the true optimal solution. On the other hand, the impossibility to hnd such an optimal set 
(because of the aforementioned prohibitive running time), makes the Cl influencer set the 
natural substitute for the optimal one, and, hence, the reference set for studying the overlap 
with node sets identihed by other methods. 


IX. HALTING EPIDEMICS: MOBILE PHONE CALL NETWORK 

While medicine has made solid advances in the isolation of new vaccines for an increasing 
number of diseases, and may expect to make still greater ones, no certain claim can be 
established for a corresponding advance in preventive immunization. 

It is of deep social importance to have a fast and optimal intervention strategies when 
new outbreaks of disease break out. Prevention methods are still limited for various reasons: 
many virus are responsible of diseases of animals that can be transmitted to humans and 
thus causing epidemics. It is difficult, if not impossible, to control the populations of vectors 
and natural reservoirs, or predict what changes in the environment can favor the epidemics. 
The development of new drugs is usually not the solution to the problem. 

It is generally accepted that an efficient way to hght epidemic diseases is the execution 
of immunization protocols and fast quarantine procedures, together with the spread of the 
knowledge of these dangers and the efforts to remove the environmental causes that favor 
them |l6]. Probably a certain percentage of diseases will always remain undefeated, but if 
only one can succeed in reducing to a minority the majority that is today vulnerable, one 
will have accomplished a great deal, perhaps indeed everything that can be accomplished. In 
this situation is highly desirable to have a guiding strategy which enables to select who must 
be vaccinated or put in quarantine. Our theory offers a protocol of selection, the closest 
one to the optimal. This result is important because immunization doses can be limited 
or very expensive in practice, and without an optimal distribution these resources can go 
inadvertently to waste. 

To investigate the applicability of Cl to an immunization/quarantine scheme in a real 
large-scale social network we consider a social contact network built from the mobile phone 
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calls between people in Mexico. Data has been provided by GranData. 

A mobile phone call social network reflects people’s interactions in social lives, and is 
generally accepted as a proxy of a human contact network. For example, the mobile phone 
network from Mexico can help us to design effective immunization strategies, by identifying 
the most relevant social contacts among people. The disease spreads through direct contacts 
of infected people and proximity and mobility data from mobile phone networks can serve as 
a proxy of human movements and possible spreading patterns in human contact networks. 

In order to build the network, we put a link between two people if there is a reciprocal 
exchange of phone calls between them in a observation window of three months (i.e. a 
call in both directions), and the number of such reciprocal calls is larger than or equal to 
three. This criterion gives us a network of iV = 14, 346, 653 nodes, with an average degree 
{k) = 3.53 and a maximum degree fcmax = 419. The result of the Cl algorithm, compared 
to HD and HDA, is shown in Fig. [^. 

The phone call network is the prototype of big-data, where a scalable (i.e. almost linear) 
algorithm is mandatory. Indeed, the size of this network already rules out many heuristic 
methods with quadratic (or larger) running time (CC, EC, BC, and EGP) and also BP. From 
the perspective of performance. Cl is better by a very good margin. Indeed, it fragments the 
network using about 500, 000 people less than the best heuristic strategy (HDA) implying a 
saving of the same number of vaccines in a hypothetical immunization campaign. Moreover, 
when Cl gives a zero giant component, HDA gives still G ~ 0.3, i.e. a connected network of 
~ 4 X 10® people. This result, together with the result on Twitter, indicates that, although 
the theory has been developed for a locally-tree like network, in real networks with loops 
the Cl-algorithm performs quite well as well. 
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